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Abstract 

Internet traffic on a network link can be modeled as a stochastic process. After detecting and quantifying the 
properties of this process, using statistical tools, a series of mathematical models is developed, culminating in 
one that is able to generate “traffic” that exhibits -as a key feature- the same difference in behavior for different 
time scales, as observed in real traffic, and is moreover indistinguishable from real traffic by other statistical 
tests as well. Tools inspired from the models are then used to determine and calibrate the type of activity taking 
place in each of the time scales. Surprisingly, the above procedure does not require any detailed information 
originating from either the network dynamics, or the decomposition of the total traffic into its constituent user 
connections, but rather only the compliance of these connections to very weak conditions. 


1 Introduction 

1.1 The problem 

The object of this paper is the study of the behavior of Internet traffic, as observed on a network link (typically 
shared by a large number of network users), aiming to the identification, quantification, and justification of its salient 
features. Traffic observation here assumes the form of data traces^ i.e. sequences of the form {(t^, where di 

is a data quantity (possibly measured in bits, bytes etc.), and ti is its time stamp, the time when di was observed. 
Throughout the paper, though, it will be more convenient to use binned data traces i = l,...,M = with 

bin size A, defined as: Xf" = X]{j|iA<tj<(i+i)A} ignorance of the exact procedures taking place in the 

network, due either to their complexity, or to lack of information, as well as of the behavior of its users, forces us 
to view X^ as a stochastic process jS). 

This process has been the object of study of numerous researchers. It has unusual and initially unexpected 
properties, such as long range dependence ElEl, nonstationarity EIEI, and different behavior on different time scales 
[EHIHlEnillSI- Moreover, even for the coarser time scales, its marginal distributions may be neither Gaussian, nor 
p-Stable. This is all the more surprising, since the Central Limit Theorems would lead one to expect the opposite, 
because a) the observed traffic is the aggregate result of many users, possibly acting independently ^21IHI b) 
each Xf^ is actually the sum of a large number of d^’s, and c) for small time bins, traffic is uncorrelated. 

Indeed, despite the fact that “bursts” and “spikes” seem to persist for at least 4 orders of bin magnitude (Fig. 
m and 121), which argues against a Gaussian limit, it can be easily seen, using classical fitting techniques, that the 
marginal distributions do not possess the characteristic heavy tails of p-Stable variables either; instead, their tails 
appear to be exponential or Weibull (Fig. |2|). Thus, the process seems “unwilling” to converge to any limit, in 
contrast to most of the processes observed in nature or industry, which do not exhibit this type of “wild” behavior, 
but rather get “attracted” to the (Gaussian or p-Stable) central limit much sooner (this behavior will be quantified 
more precisely in section |S1 below). 

The unusual properties of the traffic are generally attributed to the diversity of user applications, the network 
protocols, and the complexity of the network topology. Indeed, Internet traffic today is a result of numerous 

*A preliminary version of this paper appeared in Proceedings of SPIE Vol. 4868 (2002) under the title: “Some results on the 
multiresolution structure of Internet traffic traces”. 

^The author is a scholar of the Lilian Boudouris Foundation. 
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Figure 1: Traces 94 -(a),(c),(e)- and 97 -(b),(d),(f)- binned with 3 different time bins: 1ms,100ms,Is. The data 
sets 94 and 97 are two of the eight data sets used in this paper; for more details, see section 1.2 below. 
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Figure 2: Traces M6 -(a),(c),(e)- and U8 -(b),(d),(f)- binned with 3 different time bins: 10/is,1ms,10ms. The data 
sets 94 and 97 are two of the eight data sets used in this paper; for more details, see section 1.2 below. 
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Figure 3: Demonstration of the non-Gaussianity of the marginal CDFs of sets 94 -(a)-, and 97 -(b)-, through least 
squares fitting: the very high value of R in both cases shows that the curve is essentially linear, and thus that the 
tail in (a) is exponential and in (b) Weibull. Both sets have been binned using a time bin of 0.1s. 


applications: e-mail, WWW, FTP, music and video streams, etc. The duration and the size of the sessions of these 
applications often vary within several orders of magnitude. Furthermore, the details of the data transfer over the 
network are regulated by protocols, which control, among other things, the transfer rate at each time instant, and 
the fragmentation of the data to be transferred into packets. Finally, the transfer rate heavily depends on the state 
of the network at a particular moment (link availability, congestion, ISP contracts, routing tables,...), which in turn 
influences (and is influenced by) network topology. 

The traffic, though, is typically the result of the activity of a large number of network users. It is expected that 
many of the features present in each user’s own subtrafhc will just average out, leaving no trace in the aggregate 
traffic, whereas others may persist; it is not obvious, however, whether a particular feature will belong to the 
former, or the latter category. 

To sum up, it appears there is a consensus about the factors that shape the traffic, as well as about the 
inadequacy of the Gaussian distribution for the description of the shape of the traffic. This paper explores how 
different factors affect the traffic. This necessitates the development of tools that will calibrate the traffic behavior, 
along with appropriate simulations, on which these tools will be tested. 

1.2 Description of the data sets and the simulations 

This section will provide a short description of both real and simulated traces used. Some of the relevant definitions 
will be given later. 

In what follows, eight data sets (traces), coded 89, 94, 97, L4, L5, M6, M7 and U8 will be used systematically. 
Data sets 89, 94, L4, and L5 are available on the Web, at http://ita.ee.lbl.gov/html/traces.html, and data sets 
M6, M7, and U8 are also available on the Web, at http://pma.nlanr.net These Web pages also contain extensive 
information about the data sets. More precisely: 

• 89 is the trace BGP-Aug89, and it contains “one million packet arrivals seen on an Ethernet at the Bellcore 
Morristown Research and Engineering facility”. It was collected on AUG 29, 1989, starting at 11.25, and lasts 
approximately 3,143 seconds. It is by far the oldest trace used here. The volume exchanged is approximately 
434 MB. 

• 94 is the trace LBL-TGP-3, collected on JAN 20, 1994, between 14.10 and 16.10. The volume exchanged is 
approximately 244 MB. 

• L4 is the trace LBL-PKT-4, collected on Jan 28, 1994, between 14.00 and 15.00. The volume exchanged is 
approximately 131 MB. 
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• L5 is the trace LBL-PKT-5, collected on Jan 28, 1994, between 15.00 and 16.00. The volume exchanged is 
approximately 94 MB. 

• 97 contains 1 hour of the total traffic of users subscribing to WorldNet via modems, measured at the World- 
Net gateway, on JUL 22, 1997, starting at 22.00, and using a time bin of 1ms. The volume exchanged is 
approximately 9 MB. 

• M6 is the trace MRA-1015686621-1, collected on MAR 09, 2002. Approximately, it lasts for 81 seconds, and 
the volume exchanged is 4.8 GB. 

• M7 is the trace MRA-1015764522-1, collected on MAR 10, 2002. Approximately, it lasts for 82 seconds, and 
the volume exchanged is 3.7 GB. 

• U8 is the trace ODU-1015816310, collected on MAR 11, 2002. Approximately, it lasts for 89 seconds, and 
the volume exchanged is 76 MB. 

For data sets 89 and 97, information on their individual connections is not available. 

The sizes and dates of the traces described span quite a broad range: some of them contain only a few MB of 
information, whereas others contain 1000 times as much; some are extrememly recent, whereas others are more than 
a decade old. This variety will allow for the examination of the properties of the traffic, and for their validation, 
in general or specific network conditions. 

A set of simulated traces will also be used: 0-1 is a self-similar process with continuous ON/OFF intervals 
(as in [25) and will be the outcome of model A below; ARR (standing for ARRival process) takes into account 
the fragmentation of the information to be transmitted into packets (model B below); RH (standing for Random 
Heights) is the reward model, and RH HT (standing for Random Heights -1- Heavy Tails) is the a — (3 traffic model 
described in m, ARRRH (standing for ARRival process with Random Heights) is a combination of RH and ARR; 
EXP HD is a sequence of i.i.d. exponential variables, and, similarly, HT HD is a sequence of heavy-tailed i.i.d. 
variables. Finally, a model that combines ARR with the new idea of “levels”, introduced in sectional will also be 
used; its simulations will be labeled by the time scales of the levels, and an S will be added if the levels are “sharp”, 
e.g. 8, 12S, 7S/12/17 etc (see sectionjnj. 

1.3 The State of the Art 

As a result of the intensive research on Internet traffic, many models have been proposed. A requirement imposed 
was that they be parsimonious |2j: they should use a very limited set of parameters, as opposed to the millions of 
details present in a real network. In what follows, some of the most common ones will be presented and discussed. 

The Self-Similar Model (SSM): This is perhaps the simplest model conceptually. Its main assumption is 
that each of the n users generates a data stream or connection Wi{t) , i = l,...,n, the traffic being their sum: 
^ (^) = Sr=i ^ ^ . The users are supposed to act independently, and the data streams are composed 

of intervals of value 0 (where the user is silent), alternating with intervals of value 1 (where the user sends data), 
whose lengths are random, independent, and cross-independent. The degrees of freedom of this model then are just 
two: the distributions of the lengths of the aforementioned intervals. If chosen appropriately, e.g. heavy-tailed^ 
distributions are allowed, they can lead to self-similar simulated traffic m For a more precise statement, see 
Theorem A (Section (5). 

The self-similarity of the traffic, and the heavy-tailed distributions of the intervals of silence and/or activity 
for each user, have been verified in real traces. The violation of the traditional Poisson model PEI, and the long 
range dependence, i.e. slowly decaying autocorrelation (illustrated in Fig. |H|and|n|in section 15), which are both 
consequences of the heavy tail distributions, are perhaps the most striking results. However, this model is not 
entirely successful: the traffic it produces is not as “spiky” as real traffic (Fig. EJ- Moreover, the Energy funetion^ 
which gives, for each time scale, the contribution of the behavior at that scale to the total energy (more details 
in Section 121 or SHI ESI): can clearly distinguish between the simulation and the real traffic. For the simulation, 
the Energy function gives a straight line, whereas the Energy function of the real traffic curves (see Eig. (5 below, 
summarizing the Energy for data as well as simulations according to several models). 

distribution will be henceforth called heavy-tailed iff its variance is infinite, otherwise it will be called light-tailed. This definition 
differs slightly from the one implied in |S]. 
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The “curving of the Energy function” for Internet traffic traces has played an important role as a motivation 
for certain models of Internet data traffic, ever since its first observation in nn It will be used as one of the 
cornerstones of this work as well. As Section (21 will explain, though, a slightly different definition for the Energy 
function than the one in [2| will be used; the Energy function herein used is the average of the translation non¬ 
invariant concept in 121 over all possible choices of the “origin” (t = 0 plays a special role inI2|). This is similar to 
averaging procedures in many other applications of wavelet expansions. The result is that the Energy function, as 
defined in this paper, remains reliable even for very large scales (whereas the standard Energy function produces 
large oscillations and becomes unreliable for scales above 15 in Eig. EJ, making it possible to observe and quantify 
the “curving” or “bumpiness” of the Energy function even at these large scales. The issue will be further analyzed 
in Sectional 

In this paper, simulations according to this model (our model A) will bear the label “0-1”. 

Reward Model (RM): In an attempt to make the SSM “spikier”, it was proposed nnmn to change the values 
of ON-intervals from 1 to some positive random value, which would be kept constant throughout the interval, 
values of different intervals being independent. This would reffect the non-equivalence of the users, in terms of the 
amount of data they can send: some may have faster connections, may run applications generating higher volumes 
of data etc. Needless to say, the distribution of values for the ON-intervals can be itself heavy-tailed. A result 
proved for this model states that, in the limit of infinite users, the (properly normalized) total traffic can be either 
a self-similar or a p-Stable process 

This model resembles real traffic a bit better (compare Eig. 0]and^, however the two types of traffic can 
still be discriminated visually. Moreover, the marginal distribution of a part of the traffic generated by this 
model will always be either p-Stable or Gaussian, whereas in the case of the real traffic different parts may have 
different marginals (see Eig. El in sectional). Also, heavy-tailed value distributions lead to very high spikes, while 
distributions of finite second mean average out in the limit, leading to the same results as SSM (Eig. 0(b),(d), and 
(f)). Thus, the model does not allow for very fine control of the form of the spikes. In addition, its Energy function 
is once more a straight line, but an additional risk is that it can be meaningless^ in the case of a heavy-tailed value 
distribution (see Section I2|). Einally, as noted earlier, real traffic does not have heavy-tailed spikes. 

In this paper, simulations according to this model (our model B) will bear the label “RH”. 

a and fd Traffic model: This model, proposed and studied in m, adopts a different approach, decoupling the 
spikes and the main body of the traffic, and dealing with each of the two separately. It suggests that Internet 
traffic can be viewed as a superposition of two independent processes: the first {fd traffic) is self-similar Gaussian, 
and has no spikes, the second {a traffic) is a train of isolated spikes. The inter-spike intervals are independent and 
approximately exponential, i.e. spike arrivals are approximately Poisson. 

This model was derived from, and calibrated on real traffic traces, and all of the assumptions above were shown 
to hold. No distribution for the spike heights was proposed in |2ni; motivated, however, by the observed exponential 
marginals of the traces considered here, we will be using an exponential one in the simulations. The result of the 
simulation and the real traffic are almost indistinguishable visually (compare Eig. 0 Hand (2)). An example with a 
p-Stable distribution is also provided. 

To sum up, /^-traffic accounts for the long range dependence, and ^-traffic for the spikes. Thus, the model 
matches, to some extent, the appearance, and the slowly decaying autocorrelation of the real traffic (Eig. |H|and 
El). Even better, the marginal distributions of parts of the traffic oscillate between Gaussianity and p-Stability, a 
feature shared by real traffic m Unfortunately, the Energy function of the simulation is again oversimplifying 
reality (Eig. EJ- 

Simulations according to this model will bear the label “RH HT”. 

Infinite Source Poisson Model (ISPM): This is a variant of the SSM. According to this model, the traffic is 
the sum of independent piecewise continuous connections, whose range is {0,1}, and whose support is compact and 
connected. The beginning points of their supports are given by a Poisson process of rate A, and their sizes by a 
heavy tailed distribution of infinite variance (see m and references therein for more details). 

Obviously, the support of the total traffic consisting of a finite number of connections is also finite; thus, contrary 
to the SSM, this model evolves in time, as the number of connections n increases. In order to capture this evolution, 
the traffic can be normalized in time by a parameter T, which denotes time resolution, i.e. the time scale at which 
traffic is viewed. If the two limits n ^ oc and T oo are taken simultaneously, the limit process can be either 
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(e) (f) 

Figure 4: Three instances of a Self-similar and a Reward process: the durations of the ON intervals and, in the 
latter case, the height of the ON intervals, are distributed according to x U (0,1). (a),(c),(e) features H 
= .6, .75, and .9, respectively, whereas in (b),(d),(f) H = .9 fixed for the ON-interval lengths, and H = .4, .6, .9 
for the heights, respectively. OFF-intervals are exponentially distributed with mean equal to their corresponding 
ON-intervals. 
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(c) (d) 

Figure 5: Traffic generated by the a and p model: (a), (b), and (c) feature spikes with exponentially distributed 
heights (with mean 0.5, 0.5, and 0.1 respectively) and exponentially distributed interarrival times (with means 100, 
20, and 100 respectively), (d) features p-Stable spikes having exponential interarrival times with mean 100. For 
the P traffic, the same model was used as in Fig. @1 with H = .9. 


Gaussian or p-Stable, depending on their relative rates (see the discussion in m)- This behavior will appear again 
in Section 0 

Unfortunately, it seems that this limit process cannot be linked back to real traffic in an obvious way. 

It is very striking that the Energy function for these four models are very linear (as shown in Fig. |H| at least 
up to scale 15; at coarser scales. Fig. [SI is no longer accurate, but the modified definition of the Energy function 
given in Section [3 which remains accurate at large scales, the linearity is seen to persist until even the coarsest 
scales). Indeed, as many authors (including us) have found, “local” models (in which the traffic is generated as an 
average over simulated sessions “unrolling” in time) typically produce such nice, linear, but therefore unrealistic 
Energy functions. 

Although each of the models is represented by a single simulation only in Eig. ETb), we did in fact check 
many values for the parameters of these models; for other parameter values, one finds a change in the slope of the 
corresponding Energy function but not in its essentially linear character. 

In simulations for the models listed above, the features of the model seem to control only the slope, or, at best, 
introduce only one slope change from one scale onwards (Eig. inj. Moreover, the Energy function appears to be 
























































































































insensitive to the “spikiness” of the process (compare EXP IID with HT IID in Fig. El^b)), although it seems to 
depend on the autocorrelation. 

The failure of these models to produce curved Energy functions stimulated researchers to seek more “exotic” 
models, such as cascades, and multifractal processes in general ini 123 ESI El These models, however, are not 
linked in any way to a description of user or connection behavior. 

Multifractal models: Multifractal models, unlike all of the preceding models, produce curving Energy functions 
im They also produce nontrivial multifractal spectra 1221. The simplest representatives of this category are the 
random cascades 01221 However, in our view, these models are “global” rather than “local”: the stochastic 
process that generates the simulated sessions is not inspired by network mechanisms or user characteristics, or the 
structure of these in time. To the contrary, these processes typically treat the whole observation time interval as 
an entity, and link probabilities of certain behavior in a subinterval to the probability of other behavior elsewhere, 
without any motivation, other than that it produces Energy functions similar to those observed in real traces. 
Since they do not take into account network mechanisms, at least in a straightforward manner, they offer little 
help towards understanding these mechanisms. For this reason they will be considered no further here. 

Network simulation models: These models adopt a completely different point of view, trying to incorpo¬ 
rate into the model every single detail of the routers, the links, the protocols etc. Examples are the Net¬ 
work Simulator and (NS)(see and references therein), and the Scalable Simulation Framework (SSF) (see 
http://www.ssfnet.org/homePage.html for more information). As expected, they produce very good results, but 
the overwhelming number of parameters violates the parsimony principle, and makes it hard to estimate the impact 
of each of them on the traffic. For these reasons, this approach falls outside the scope of the present paper. 

Summary: In the list of models above, some follow an empirical approach, which attempts to capture properties of 
the traffic, but ignores the underlying mechanisms {a — f3 Traffic, Random Cascades); others present mathematical 
constructions that are motivated by network and user behavior, but produce traffic that only partially corresponds 
to reality (SSM, RH). As far as we know, there exists no model that produces traces a) that have the spikiness 
and long-range dependence of real traffic traces, b) whose marginals match the behavior of the observed marginals, 
which are sometimes close to and at other times deviate far from Gaussianity, and c) with a curving Energy function. 
Obtaining a curving Energy function via a “local” model seems particularly hard: extensive experimentation by 
the authors with simulations of the models described above and in Section o always produced linear, or at best 
essentially piecewise linear graphs (as in Fig. Eh) and l^HT aH. except for our model D described in Section IHl 

Random Cascade models do produce curving Energy functions; they are however not satisfactory, not only 
because they offer no explanation of the underlying mechanism, relating to network protocols or user behavior, but 
also because the resulting traffic still contradicts real traces in various other aspects (auto-correlation, marginal 
distributions etc.). 

1.4 Summary of our results 

The purpose of this paper is to present a model that describes Internet traffic on a network link; the main guideline 
in the construction of this model will be the incorporation, into the SSM, of additional network mechanisms 
and aspects of user behavior that will be identified as responsible for shaping the traffic. It will turn out that 
enriching the SSM with only a few extra features leads to simulations exhibiting spikiness (i.e. marginals), long- 
range dependence, curving of the Energy function, and wide oscillations in the deviation of its marginals from 
Gaussianity that are qualitatively and quantitatively indistinguishable from the same features of real traces. The 
impact of these features on the traffic will be rigorously determined by means of theorems, and the predictions of 
these theorems will be subsequently validated against real traffic. The success of this validation will suggest that 
the factors that have been identified as important for shaping the traffic are indeed important, and may explain 
the behavior of the real traffic singled out by the four criteria above. 

In order to increase clarity, the construction will proceed incrementally in 4 steps, labeled A, B, C, and D. At 
each step, a theorem will state and prove the properties of the model built so far. Step A will be the plain old SSM; 
step B will take into account that information transmission does not take place continuously, but rather in packets 
spaced apart by variable time intervals, which is turn are determined by the Round Trip Times; step C will then 
introduce the Slow Start feature of the TCP protocol into the model, arguing that the small volume of information 
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(a) (b) 

Figure 6: Energy functions, according to the standard definition (Definition 1 in Section [3 for (a) data and (b) 
simulations. They are normalized up to an additive constant, so that they all start at 0. Notice that the Energy 
functions in (b), corresponding to the simulations described in Section are approximately linear. Notice also 
the large oscillations at coarse time scales, caused by a lack of accuracy at coarse scales; Definition 2 will take 
care of this. Einally, note that HT IID and 0-1 produce indistinguishable results, although they have completely 
different properties. 


exchanged by the majority of the connections is such as to allow Slow Start to impact the traffic heavily; and, 
finally, step D will enrich the model with other, longer time scales, characteristic of network activity, originating 
from either user behavior (e.g. switch of application), or protocol actions (e.g. congestion control). 

Step A: The basic model, which will serve as the founding ground for the rest, will be the SSM, introduced and 
studied in 121. Theorem A states and proves its properties. 

Step B: Here, a finer structure will be imposed upon the ON-intervals of the SSM: each value of 1 will be followed by 
a RTT, and RTTs will be modeled by means of random variables. In general, little is known about their distribution 
unini, but in large capacity links of the Internet backbone, packet interarrival times tend to be exponentially 
distributed and independent IHj. Eortunately, this is not a serious issue, since Theorem B below asserts that the 
simulated traffic is insensitive to this distribution, as long as it has finite variance. Moreover, Theorem B states 
that the traffic is actually Gaussian, and fairly uncorrelated at small time scales (Brownian motion), but that the 
correlation grows stronger in larger time scales (causing the process to become self-similar); this prediction is borne 
out by real data (Eig. |H|and El). 

Step C: This step incorporates features of the TCP/IP combination of protocols, responsible for the bulk of 
data transfer over the Internet today, into the model. One of the characteristics of TCP, the Slow Start, roughly 
stipulates that the number of packets sent at each emission time increases geometrically by a factor of 2 (1,2,4,8,...), 
until either loss occurs, a preset maximum is reached, or all data is sent (for more details, see e.g. EH). In detailed 
traces, which show individual connections, the Slow Start is easily discernible; this suggested its incorporation in 
the model, and the exploration of the consequences of its inclusion. Theorem C states that the traffic will tend to 
be either self-similar or p-stable, contingent upon the Slow Start maximum and the number of users. In practice, 
p-Stability cannot be distinguished from other very large deviations from Gaussianity, since observations can only 
last for a finite amount of time. Nevertheless, the fluctuation of the number of active users in time will allow for 
both self-similar regimes and regimes that deviate far from this self-similar behavior within the same trace, as is 
also observed in real traces. An increase of the number of users steers the process towards self-similarity. 

Step D: User connections over the Internet seem to be governed by a hierarchy of time scales (levels)] for example, 
a particular session of data 94 seems to live in six different time scales (Eig. [TTj) . at approximately 2000, 500, 100, 
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40, 5, and 0.1s, and a session of trace M6 in three different time scales (Fig. ITH)) . at approximately 2, 0.4, and 0.01s. 
To the best of our knowledge, there has been no effort yet towards the exact quantification and explanation of these 
levels at different scales, but it is commonly accepted that they exist pomHi A possible explanation could he in 
the recursive structure within Internet applications (e.g. Web browsers opening multiple sessions, FTP applications 
establishing multiple connections, etc., TCP, which sends and waits for receipt acknowledgement, etc.), and the 
human work habits, according to which tasks are broken into subtasks recursively. Analysis of 50 typical (long 
enough) sessions of trace 94 reveals that levels are present in all of them, and are more or less similar. Moreover, 
their statistical properties (mean duration, variance, and range) are similar for other traces of the same time period. 

So far, model A had just 1 level, and models B and C each had 2 levels. This step introduces more levels, at 
larger time scales. Theorem D then proves that such levels cause the Energy function to curve. 

The paper is organized as follows: Section[21 describes the statistical tools used (including the modified definition 
of the Energy function), and sectionsIHlEISl andEldeal with models A,B,C and D respectively. Section 7 concludes 
the paper with a discussion and some applications; in particular, several simple tools are introduced that detect 
the presence and number of levels, and that quantify the behavior of the traffic. 


2 Statistical Tools 

Let represent a time series. In what follows, extensive use of three standard statistical tools will be made: 

1. Empirical marginal distributions: 

n 

Fn{t) 

i=l 


2. Autocorrelations: 


where ^ Ya=i 


Corr{k) = 


E7=li^i - Xn){Xi+k - Xn) 


3. The Kolmogorov distance between two Cumulative Density Eunctions (CDEs), which is defined as the maxi¬ 
mum of the absolute value of their difference: 


dK{Fi,F 2 ) := sup|Fi(x) - F 2 {x)\ 


( 1 ) 


In addition, the Energy function will be used, as well as a more intuitively appealing (in our opinion) variant, the 
p-Averaging function (similar to the Normalized Variance appearing in Q]), introduced below. 

The definition of the p-Averaging function will be completed in two steps: initially, a definition that ensures 
compatibility with the well established Energy function |I5III6j (in a sense to be made clear below) will be proposed. 
Then, a simple observation will lead to the redefinition of both Averaging and Energy functions. 


Definition 1:. Given a stationary sequence for some M = 2"^, define 
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For Mj = M/2-^+^ and p > 0, define the p-Averaging 
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and the Energy 

^ Mj 

Ej+i = ^{dj,kf, j = 0, 1 

^ k=l 

Notice the connection of this definition of the Energy function to the Haar wavelet coefficients as in nniiini- if 
p = 2, the relation between the two functions is elementary: 


log 2 (-Ei) = J - 2 + 2 log 2 j = 1 ,m 

[ 2 ] 

In the sequel, A^- ^ will be used almost exclusively, because it is easier to analyze, and simply connected to the 
commonly adopted Ej] it will be indicated just by Aj and referred to simply as Averaging (function). When p ^ 2 
is considered, it will be stated explicitly. 

This set of definitions has been motivated by the following heuristic: 


Heuristic: Let Xj represent the value of the traffic for the ith time bin, taken to be the initial time bin, and 

1 "" 

assume the sequence is ergodic, (i.e. — Xj EXi). What is the rate of this convergence? The way to determine 


i=l 


the rate is to find a function such that 


\ 




2 \ 


n 

- V W - EXi 
n 


V 

' ^ • -1 

2=1 

/ 


1. Typically, the function is 


1 ^ ^ 

increasing in n, at least for n sufficiently large, expressing the fact that the average — 7 Xi gets closer to E(Xi) 

n 

when n increases. In practice, because E(Xi) is unknown, one can still get a handle on if by comparing two 
different averages: 


E 


-t n 1 ^ 

m Z—rf' ri Z—rf' 


i=l 


i=l 


= if^(n)E 




j;^(n)E 


1 " 
n 


i=l 


+ 


n ^ 


i+n 




i-\-n 


i=l 


j 2 ( \ \ nn ( ( ^ \ ( ^ \ 


I i=i i=i 


i=l i=l 


i=l / \i=l 


= IE E -EE \ = 2 ^ \ ^ E[E(^V,) - E(r,r,+„)] 


( 2 ) 


i=l i=l 


i=l i=l 


i=l i=l 


where Yj = Xj — E(Xi) was set. In (**) we used stationarity. Notice also that (*) proves that 0 < limQE 2. 

We claim that, if the Xj are asymptotically independent^ i.e. Vi,E(X^Xi+j) — E(Xi)^) = E(FWi+i) ^ 0, as 
j ^ 00 (irrespectively of the rate), then Q has a strictly positive limit, and, therefore, that it can be used to 
estimate if (within a multiplicative constant). Because asymptotical independence is satisfied by most processes 
arising in applications (it is extremely unlikely that two random variables arbitrarily long apart will exhibit strong 
dependence), this method can be applied to a wide range of processes, and is, for this reason, very powerful. 

Indeed, as n ^ 00 , for fixed i,j, E(EWn+i) ^ 0, and therefore it becomes infinitely smaller than EiYiYj). The 
RHS of (0) shows then that the limit has to be strictly positive. 
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Thus, if a random variable e;^ is defined by: 


1 


4 


(/c+l)2^' 

(Xi-E(Xi)) 

k2i + l 


= 'ip{2^)\mj^k\ 


then 


Aj+i — 


/ , M, 

“b-si 

\ ^ k=i 


1/2 


M. 


1/2 


rrij^kl 


V’(24 ( M,- 2 


C 


V>( 2 i) 


If Mj is large, Aj+i gives us a way to estimate V’(2'^)- In particular, if ‘tp{n) = n“, then: 


logs = K -aj and logs Ej+i = 2K -1+ j{l- 2a) 

Therefore, the binary logarithms of the Averaging resp. Energy functions of the processes that “average” with 
a rate of n~^ will be straight lines with a slope of —a resp. 1 — 2a. Throughout the paper, the “curvature of the 
Energy or Averaging function” will refer to the curvature of their binary logarithms, plotted as functions of j. 


Remark 1:. In the case of processes with a p-Stable component (heavy tailed marginals), the Energy function, 
as well as the p > 2, are not well defined: their expected value is infinite, because the 5 ;^ do not have a 

second moment. This is the reason why the Energy function fails to distinguish spiky processes (with heavy tailed 
marginals) from ordinary white noise processes, as shown in Eig. IH^b), where the EXP IID simulation, which is 
very close to white noise, and the heavily spiky HT IID simulation are seen to have the same Energy function. The 
few experiments related to distributions with heavy-tailed marginals will be performed with p = 1. 


Remark 2:. Aj has another interpretation that will prove essential later. Namely: 


^i+i — 





2 


Applied to the process Xi \= — / 

^ JiA 


(xi-^Xi+Y.Xi-x(f = ^^Var (xf) - Cou 

1 /'(i+l)A I n 

—p= > Wi(t)dt, the above computation yields 
^Var W^it)dt) - Cov Wi(t)dt, Wi(t)dt) 


Aj+, ^ K 


2^ A 


for some positive constant K, which also absorbs n, as it is now fixed. This formula connects the Energy/p- 
Averaging function of the total traffic to the behavior of one user, who will be “typical” in the mathematical 
sense. 


Remark 3:. Although the name “p-Averaging” may appear to be inappropriately chosen at first sight, it actually 
describes well the fundamental principle of the function, which may be buried under its complicated definition. 

The p-Averaging function endeavors to measure the rate of convergence of averages of n consecutive points of 
the process to its true mean, as n increases, where “rate of convergence” denotes the deviation between the n point 
mean value estimator and the true mean. Unfortunately, the true mean is unknown. One way to circumvent this 
problem is to measure the rate by which two n point estimators converge to each other; the two aforementioned 
convergence rates should be essentially the same, their relative difference being at most 2 at all times (see the 
related discussion in the Heuristic above). 
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This basic idea will then subsequently be elaborated in Definition 2 below, which ensures that the rate gets 
measured accurately, by comparing a large number N of pairs of n point estimators, n increasing geometrically, 
and, in addition, that it gets measured equally accurately for all n, by making N independent of n. These details 
aside, though, it is clear that the p-Averaging function “averages” the traces in order to estimate their mean, or 
rather how difficult it is to estimate their mean, hence the name “Averaging function”. 

It has been implicitly assumed, of course, that Xi is stationary. Although there are many arguments against 
Internet traffic being stationary, it can be assumed to be so over reasonably short time intervals. This could mean 
a couple of hours, according to (HI), so the older data sets (before 1997), with a duration of at most 2 hours, can 
be considered to be stationary. The more recent ones only last for less than 2 minutes, but they are also the 
products of a much faster network. Are they also stationary? We give two reasons why we believe they are: a) 
Non-stationarities, such as the diurnal day cycle, that are an artifact of human activity, do not become faster, as the 
network gets faster, b) In the older traces, consecutive packets in the link seem to be 0.1ms apart, whereas in the 
newer traces they seem to be Ips apart, in order of magnitude, i.e. the network has become faster by approximately 
100 times. So, what needed an hour before needs now approximately a minute, and thus the duration of the old 
and new traces, normalized by the network speed, is the same. 

Under the definitions above, the computation at the scale 2-^ , for either the Energy or the Averaging functions, 
involves a mean value over numbers. Consequently, for j close to m, the sum will comprise insufficiently 

many terms to produce a reliable mean, assuming that such a mean is indeed the ultimate goal of the computation, 
and undesirable oscillations may be observed (see FigEJ. Instinctively, it is expected that averaging over more 
numbers will lead to smoother curves. Thus, a modification of the definition of the Energy and p-Averaging 
functions is in order; the definition that follows will allow for averaging over the maximum number of terms 
possible, i.e. 2"^“^, by using a standard technique of statistics, known as overlapping blocks. 

From the point of view of wavelets, the definition change amounts to averaging the Energy function computations 
over all possible choices of the “origin” for the Haar basis underlying the definition of the dj^k-] this type of averaging 
is now commonly done in situations where the use of a wavelet basis introduces a translation non-invariance that 
was not present earlier. 

Definition 2:. The operator in the above definitions will be formally replaced by E, the expectation 

^ • = ! 

operator. Moreover, for reasons of clarity, the domain of A will be shifted left by 1, i.e. it will start with 0. The 
same name and notation will be kept for the new functions emerging out of these definitions, which, explicitly, take 
the form: 

= (E(mj-i)P)p 

and 

Ej = E(d,-,i)2 

Remember here the stationarity assumption, which implies ruj^k = dj^k = the equality holding in 

distribution. 

How different is the new definition from the old? Stationarity can now be fully exploited, in the sense that there is 
no “beginning of time” for the sequences: namely, it can be assumed that they are ordered circularly (on a ring), not 
linearly (on an interval). Thus, for example, computing A (0), in addition to considering the differences used before, 
i.e. Xo,i-Xo, 2 ,^o, 3 -A'o, 4 , ..., Xq, 2 ^- 1 -^ 0 , 2 ^, one will also take into account Xo, 2 -Xo, 3 , Xo, 4 -Xo, 5 , •••,Xo, 2 ^- 2 - 
Xo, 2 "*^-i 5 Xo, 2 "*^ — Xoq. The circular folding of the indices is justified by the stationarity assumption. In practice, 
this means that the computations will be done according to the formula: 
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HiU) = 


FiiU) = 


2J + i_i / 

hr E E Kij)-F^Mij)\' 


i/p 


2j+i ^ \ 2"^-i 

z=o \ k=l 


20 


1 

j=i 

1 

fE-'- 


l+((/c-l)2-? + i+/+i-l) mod 2^ 


20 


1+((A:—l)2-? + i+2J-|-Z+i—1) mod 2^ 


i=l 


which can be simplified into: 


lb] 


1 


2m+jp 


2 -? + i-l 2^-0- 

E E 


z=o 


k=l 


2^ 


(G^i/cz(i) “ ^ikiU)) 


i=l 


p\ IIp 


^iklU) ^1+((A:—l)2-?'+i+/+i—1) mod 2^^ 

^iklU) = -^1+((/c-1)2J + i+ 2J+Z+2-1) mod2’^ 


and similarly for the Energy function. Observe that the inner mean value (on k) is the previous definition. The 
outer (on 1) is the improvement: now, for every j, the calculation involves a mean value taken on 2"^ points. 

In the case p = 2, an alternative representation exists: 


(3) 

where R is the autocorrelation of the traffic. This representation proves that Aj is really a linear transformation 
of the autocorrelation, so one should expect it to be insensitive to the “spikiness” of the traffic. This is indeed the 
case (Fig. EJ. Note that the can be computed via a fast algorithm of complexity 0 {m2'^); for Aj this can be 
seen directly from 0, since R as well as the convolution in the last term of © can be computed via the EFT. 
For general p, one can use the multiresolution character of what is essentially a computation with Haar wavelet 
coefficients to obtain a 0 {m2^) algorithm (this also works for p = 2). 

Under the new definition, data and simulation Energy functions assume a slightly different form (Fig. [3 compare 
to Fig. El). 




1 -^) [2 i?(i)-i?(2^+i)-i?(2^-i)] 


3 Model A 

This model assumes that the traffic on a link consists of the data that n users are sending, i.e. it is the sum of their 
data streams kU (t), i = 1,..., n. These are modeled as bi-valuate stochastic processes (at a given t, kF (t) = 0 or 1), 
whose paths are piecewise constant. The intervals of value 1 and 0 will be strictly alternating, and their durations 
will be assumed to be i.i.d. for each of the two interval types and for each user, cross-independent between interval 
type and cross-independent between users. 

Defining process A. Let and be i.i.d. sequences of positive random variables with the 

following properties: 


(0^(1) > t) C for p G (1, 2) and 0 < C < cxo 

and 


( 4 ) 


EO}{if < 00 

Define a process W‘^{t) as follows: 


15 











0 1 2 3 4 5 6 7 a 9 10 11 121314 151617 18 19 20 
Tine scale 


Eneigy function for simulations 



rime scale 


^ 0-1 
RH 

^*-0-1+RH 
EXP IID 
RH IID 
->-ARR 
ARRRH 


(a) (b) 



Figure 7: Energy functions, according to Definition 2, for data -(a), (c), and (d)- and simulations -(b). Compare 
to Fig. ini 
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w^it) 


1 if Ef=i^ (Oj(i) + 0^(i)) + Of{k)<t< (o^(i) + 0^(i)) 
0 Otherwise 


and consider W^{t) to be a “stationary” version of i.e. let u be uniformly distributed on the interval 

0, Oj(l) + 0^(1) and define W^{t) = W°^{u^t) Finally, let {W^{t)} be i.i.d. such that W^{t) = where 

= denotes equality in the sense of all finite dimensional distributions of any order, i.e. 


Vn G TV, V(ti,..., t,), p(fF(ti),..., W{tn)) = p{Wi{h ),..., 


We will also use the notions of weak convergence in the sense of the finite dimensional distributions, i.e. finite 
dimensional convergence^ denoted by Wmif) ^ W{t)\ 

VnGTV, V(ti,...,tn), lim p(W^(ti),..., = p(fF(ti),..., W(t,)) 

m^oc 

and stochastic equicontinuity: 


limlimsupP sup \Wn{t) — bFn(<^)| > £ = 0 

(5^0 n^oo Y|t-s|<(5 J 

Finally, a stochastic process W (t) will be called self-similar iS 3 H >0: Vt > 0, W {t) = t^W {1). This is 
certainly not the only possible definition (see [S] for some alternatives), but it is the most usual. 

This setting yielded a seminal mathematical result about the properties of Internet traffic, explaining its self¬ 
similarity in a LAN m Theorem A is closely connected to this. The partial proof offered spells out in more detail 
some steps skipped in m 

/■* 1 " 

Theorem A. LetT^{t) = J -^^W“(s)ds. Then: 


Z^{t) = T“(t) - ET^it) %G^it),te [0,T] 
where G{t) is a centered Gaussian self-similar process with covariance structure: 

Cov (G“(s), G'“(t)) = E W^{x)dx {x)dx'^ - E {x)dx'^ E W^{x)d 3 

Proof. First of all, it is straightforward that: 

^ n n 

Cov(Z“(s), W) = - E E E ((V* - EX/)(X* - EXj)) 


i=l j=l 


= E 


^ W^{x)dx W^{x)dx'^ -e(^J^ W^{x)dx^E(^J^ W^{x)dx 


pt 1 ^ 

Now, fix t € [0,T] and define Xj = / W“(a;)da;. With this notation Z^{t) = ^ E Pt “ EX*). 

Jo \/n 

To show finite dimensional convergence, it is sufficient to prove that: Vfc G G [0, T], Voi,..., Ofe G 

Efci -^(0,<7^), where depends on a's and t's. Easily 


n k 


E = 4 E E - E («i v‘0) = 4 E 

j=l ^ i=l j=l ^ 2=1 


^This technique was also used in 
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where Yf' = 


3 = 1 


W^{x)dx. Clearly F-^’s are i.i.d. and bounded so classical CLT can be invoked. 


The next step is to establish tightness. For this it is sufficient (and necessary) to show stochastic equicontinuity. 
That is, Ve > 0 


(5^0 n^oo 

The following estimate will be needed: 


linilimsupP [ sup \Z^{t) — Z^{s)\ > e \ =0 

K\t-s\<6 I 


(5) 


E(Z“(t) - Z^is)f = VaT{Zl(t) - Z^is)) 

= Var{Xl - Xt) < E(X| - Xff = E Wt{x)dx^ < {t - sf 

A classical “chaining” result (see PHI Thm. 11.1) yields the following: 

Lemma A. Let be a real-valued stochastic process such that V5,t G M : (E(^t — < d{s,t). Then 

VA C M : Card{A) < oo 


e[ sup 16-61) <8/ ^N{A,d,x)dx 

\^d(s,t)<S:s,tEA J Jo 

where d(-, •) is any pseudo distance on M and N{A^ d, x) is minimum number of %alW^ of radius x (under distance 
d) needed to cover A. 


In order to use this lemma we first observe that in our case p = 2, d{s, t) = \t — s\ and 

N{A,d,x) < N{[0,T],d,x) = Tx-^ 

Since the right hand side does not depend on A or n, the restriction Card{A) < oo can be removed (see Theorem 

11.6 in 12HI)- Thus: 


p( sup \Znit) - Znis)\ > e] < e-^E I sup \Zn{t) - Zn{s)\] < - [ ^/N{[0,T],d,x)dx = 

y|t—sl<(5 J J ^ Jo ^ 

This establishes stochastic equicontinuity. 

The proof of the self-similarity of the process has already appeared in m, and therefore it will not be 
reproduced here. 

□ 


4 Model B 

The first criticism to the usual bi-valuate SSM (model A) is the continuity of the ON-intervals. The data transfer 
between computers is quantized (in packets), and although a continuous approximation may be acceptable, if only 
time scales much larger than the typical RTT are of interest, for studies of traffic in the sub-second ranges a more 
detailed model has to be considered. 

Thus, the first modification is the discretization of ON-intervals. Consider the usual SSM, with W-^{t) indepen¬ 
dent (for i = 1,..., n). For a fixed connection (fixed i), the OFF-intervals will have lengths Oj(i), j G A, where the 
O/’s are i.i.d. random variables of finite variance. At the end of an OFF-interval, a heavy tail distributed random 
integer L will be assigned, representing the “load” (number of packets) the user wishes to send in this particular 
session. More precisely. 


3p G (1,2), 30 0 : {L >t)^^C 


( 6 ) 
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At this point, an ON-interval begins, but, in this model, packet emissions assume the form of delta functions, 
expressing mathematically the fact that information is transmitted in individual packets and in particular time 
instants, and the j-th packet emission of the i-th ON-interval is followed by the idle time interval Rij > 0: these 
are random variables i.i.d. in i and j (with < oo), independent of the OFF-interval lenghts and the session 

loads L. After all of the L packets have been sent, an OFF-interval follows, and the cycle restarts. Notice that 
ON-intervals (representing sessions) are heavy-tailed, as consisting of a heavy-tailed number of light tailed time 
intervals. 

The random inter-arrival times Rij^s are designed to address the uncertainty of the length of RTT’s. Since 
exponential distribution has the highest entropy (for any positive random variable), it would be a reasonable choice. 

nt n 

The total traffic from time 0 until time t is defined by T^{t) = / {s)ds. Theorem B below asserts the 

convergence of properly normalized total traffic, under very general conditions on the distributions of the L and 
Ri^j^ to a centered Gaussian process with continuous sample paths. 

Defining process B. Let {Oj(i)}^;L same as and let {I/^(i)}^^ be the same as {0^(^)}^i, but 

with the extra requirement that L^{i) be integer valued. Let also {Rij}'^j=i be an i.i.d. array of random variables 
so that: 


Rij > 0 a.5., E(i?^j)^ < oo, R^q = 0 a.s. 

and 

\im6-^P(R^. <5)<M <oc 


(7) 


m 

Let also 0^{i) = Finally, define formally the “delta” function (5a,(s) with the properties that Sx{s) = 0 

j=i 

px-\-h ^ 

if X ^ s and / 6x{s)ds = 1 for all h > 0. Using these conventions, define further: 

J x — h 


w^{t) 


lit) if t = YHzI {o){i) + Oiii)) + 0)ik) + ELi Ri,i > I < L\k) 
0 Otherwise 


As was the case before, define W^{t) to be a stationary version of and let {W^{t)} be i.i.d. versions of 

Finally, define a constant EVF^ such that E W^(x)dx = EkF^|t — sj. 

For this process, a result analogous to Theorem A holds: 


1 ^ 

Theorem B. Let T^{t) = / ^Y^Wj;{s)ds. Then: 

Jo Zn 

Zlit) = T'^it) - ET'^it) % G\t), for t e [0, T] 


where G^{t) is a eentered Gaussian proeess with eovarianee strueture: 


Cov{G\s),G\t))=E 




Proof. The assertion will be proved under a slightly stronger condition on the R^ j than 0 ; more precisely, it will 
be assumed here (as well as throughout the paper) that 3/3 > 0: 


0 < /? < RZ a.s. (8) 

The proof under the less restrictive assumption o is more technical, and the differences will be commented upon 
in the course of the proof; these comments will be indicated by square brackets [ ]. 
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The finite dimensional convergence follows exactly in the same line as in Theorem A. Namely, Z^{t) = 


1 ^ 

— [Xl — E where Xl = / W^{x)dx. Under the assumption 


0 , Xj are bounded hy (3 and clas¬ 


sical CLT can be used. [A short computation reveals that, under the original assumption O, Xl has exponential 
tails and one can still use the same argument.] 

Stochastic equicontinuity 0 will be proved in two steps: 


Step 1. It is sufficient to show that for every 5 > 0 and 7 > 0 


limlimsupP [ sup |Z^(t) — | > ^ ] =0 

Yi^oo y7n“^/2<|t—s|<(5 J 


(9) 


Proof: Ve > 0 consider the partition A^ = < = 


i = 0,..., K^^/nT > where = s ^SEIUi. For every t. 


Ks^/n 

define functions and 02 • [0? ^ such that 0i (t) =tj<t< tj+i = 02 (^) some j < K^^JnT. 

In other words (j)^ and 02 are closest elements to t from the cover A^ that enclose t. 

Clearly 

< T^nit) < T^m)) 

which implies 

Zl{t) - Ziis) = T^{t) - E (7(t)) - 7(s) + E (7(s)) 

< - E W)) - T^irAs)) + E (TAm))) = 

= TAm)) - E (7(</>? W)) + E {TA4>At))) - E {TAm))) 

+ E {tA4>As))) - tA4>As)) + E (T^(0^(i))) - E {tA4>As))) (10) 


Since E - E (7(0^(i))) = E 


one gets: 
Similarly 


yh f ^ lyh fin / 


WAx)dx = = EWpKe = e/ 8 , 


Zl{t) - zAs) < zAm)) - ZA4>As)) + e/A 
zAs) - Zl{t) < ZAitiAs)) - zA^At)) + efA 


Combining the two inequalities: 

\zAt) - zAs)\ < \zA<PAt)) - zA<liAs))\ + Ki<I^As)) - zArAt))\+s/2 

and since sup | 7(<('2 (^)) - (s))| = sup \zA^ 2 is)) - ZA^iit))\- 

|t —s|<(5 1^ —-s|<(5 


P sup \zAt)-zAs)\>e] <p[ sup \zA4>At))-zA4>As))\>s/4] < 


P sup |Z^(0^(t))-Z^(0?(.))|>5/4 

\Kr^n-i/2<|t-s|<(5 J 


where in (*) we used that |t — s| < ^ 02 (^) — 0i ( 5 ) > 

This proves Step 1. 


Step 2. The goal here is to show that m holds for all 5,7 > 0 


limlim supP 

(5^0 fl - ^QQ 


sup 

\yyn~^/‘^<\t—s\<6 


\zAt)-zAs)\ 


> S 


= 0 
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.t . n 

Using the notation Xl = / W^{x)dx, ^n(^) = ~/= ~ ^ 

Jo Vn 


E (Z^(i) - = E [{Xl - Xt) - E(X| - X!)]j 

< ^E ((X* - Xf) - E(X* - Xf))'‘ + 6 (e ((X* - Xl") - E(X* - Xf))^)^ 

< i (e(X‘ - Xi")4 + 6E(X* - Xlf (E(X‘ - Xf))^) + 6 (E(X* - Xi")^)^ (11) 

Let us examine the distribution of Qs,t = ~ Wi{x)dx. The assumption Q coupled with the 

definition of W^(x) implies that for S small enough, there exists a constant C such that 

Qs^t = 1 with probability C\t — s\ 


Qs,t = 0 with probability 1 — C\t — s\ (12) 

Thus, for S small enough, the above expression m is bounded by C{\t — + \t — s\/n) for some (possibly 

different) constant C. Therefore, for 1 > > |t — s| > 3Ci > 0 such that: 

||Z^(i)-Z)((s)||^^<Ci|i-s|i/2 (13) 

[It is not hard to show that, under the original assumption Q, there exist constants z^i, z ^2 >0 such that, for S 
small enough. 


P{Qs,t = k) <Ui\t- sy P{Qs,t = 0) > 1 - U 2 \t - s|.] 

With this estimate one can easily recover US) under the original assumption O- 
Lemma A can be invoked now. Letting d{s^t) = \t — we have (up to a constant Ci) 


E 


sup 


|Z^(t)-Z^(s)| 


p6 p6 

8 / ^N{[0,T],d,x)dx <8 T'^l'^x-^'^dx = 

Jo Jo 


(14) 


□ 


Probably more interesting is Corollary B, which derives some properties of the limit process: 

Corrolary B. The Gaussian process G^{t) has the following properties. There exist constants 0 < C, < oo, i = 1,2, 
such that: 


Var(G'’(A)) 

A2/P A 

Var(G''’(A)) 

AV2 ^ 


Hi) Vk G N, Cov (G^(A), G^((fc + 1)A) — G^(fcA)) —> fc p” (p was defined in 0) 

A^oo 

iv) Vfc G N, Corr (G'''(A), G''((fc + 1)A) - G\kA)) 0 


Proof. Arguments for i) and in) are essentially the same as in m Namely, it is easy to show that for “large” A 
the processes and behave in a similar way. In particular, it can be shown that there exists 0 < Kp < oo, 
such that: 


E(G'’(A))^ 

E(G“(A))^ 


—> Kp, Vp > 0 
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For ii) and iv) the same approach as in the proof for Theorem B will be used. Namely, details will be presented 
only under assumption 0 , and comments will be given on the extension of the results under the original assumption 

CZI). 

For ii)^ one can show easily (using HI ‘211 1 that for A < /3 


Var (G'’(A)) = Var 



Var (Qo.a) = G2A 


[Under the original assumption Q, estimate dH holds which also implies ii).] 
For iv)^ one gets: 


Corr (G''(A), G\(k + 1)A) - G\k/\)) = 

E (/o^ Wmdt W^{t)dt) - E WKt)dt) E W^,{t)dt) 

Var 

Invoking the distribution of Wi{t)dt for small A, it follows that Var{Qo,A) ~ A and EQq^a ~ A, as 

well as: 


Qo,AQ/eA,(/c+l)A — 0 

The combination of these three estimates yields: 

Corr {G\A),G\{k + 1)A) - G\kA)) ^ A 

[Again, under the original assumption 0, the proof turns out to be more involved. However, it is not hard to 
show (using the estimate C3) that in this case E (Qo,aQ/cA,(/c+i)a) = CA^ + o{A‘^) which implies iv) as well.] 

□ 


The result suggests that the behavior of the process varies according to time scale: for coarse ones, it behaves 
like the SSM, but for fine ones it behaves more like a Brownian motion. This change of behavior is also reflected 
in the autocorrelation. 

The autocorrelations of the data traces follow the same pattern (fast decay at fine time scales, long-range 
dependence at coarser scales; see Fig. |H|and|n|, consistent with the predictions of Corrolary B. This argues against 
“pure” self-similarity of the true traffic, for it behaves differently on different time scales. Moreover, Theorem B 
reveals a mechanism that is responsible for this: the quantized form of data transfer (packets and RTTs). Note 
that the result is independent of the exact distribution of the RTTs, which makes it robust; it can be expected to 
persist even for the haphazard nature of RTT distribution behavior in the network. 

It should be mentioned that similar behavior has been observed before. For instance, in large capacity links of 
the Internet backbone, packet interarrival times tend to be exponentially distributed and independent 

Model B was successful in capturing the autocorrelation of real traffic, but what about spikiness? Actually, 
Theorem B states that model B traffic is only as spiky as the SSM. Model C will introduce a new feature that will 
add spikiness to simulated traffic. 

5 Model C 

5.1 Introduction 

Model B oversimplifies the data transfer between two computers: although it addresses the uncertainty in the 
network and the quantized nature of the transfer, it fails to take into account protocol specifications and connection 
speed variations. Indeed, the most widely adopted transport layer protocol, TCP, has the Slow Start feature, 
according to which the number of packets it sends each time increases geometrically by 2. This geometric increase 
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(c) (d) 

Figure 8: Trace autocorrelations at four different time bins (1ms, 10ms, 100ms, Is): the long-range dependence in 
coarse time scales is obvious. 97 (a) and L5 (d) seem to have very strong periodic components, probably due to 
some monitoring protocol. 


is readily observed in real traffic sessions (Fig. Also, connection speeds can vary from 56K modems to extremely 
fast backbone links. 

Therefore, within the framework of model B, a modification is made, so that the number of packets sent each 
time increases by 2, until either all of the L packets are sent, or a specified maximum Max is reached, after which 
it remains the same. Thus, the sequence of the number of packets sent, instead of 1,1,1,...,!, is now going to 
be 1, 2,4, 8,..., Max, Max, ..., unless L is less than 1 -1- 2 + 4 + ... + Max. In other words, the delta functions will 
no longer have equal weights, but, instead, they will be multiplied by geometrically increasing coefficients. The 
different connection speeds can be modeled through different “packet sizes”, so that the sequence of packets sent 
from a particular user now becomes A, 2A, 4A,..., XMax, ... m Finally, packet losses can by incorporated in the 
model by allowing Max to be a random variable. 

5.2 p-Stable variables and processes 

Theorem C will make use of p-Stable variables. The purpose of this paragraph is to familiarize the reader with this 
quite “exotic” type of random variables, and the random processes they generate. More information can be found 

in [221 . 
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(a) (b) 



(c) 


(d) 


Figure 9: Trace autocorrelations at four different time bins: 10/is, 100/is, 1ms, 10ms for M6, M7, and U8, and 
1ms, 10ms, 100ms, Is for 89. Trace 89 shows a relatively strong correlation irrespectively of the bin used. On the 
contrary, the newer traces seem to be uncorrelated, if binned with a bin other than 10ms. If, however, the 10ms 
bin is used, M7 remains uncorrelated, but M6 and U8 show a weak correlation. 


A distribution is defined to be stable iff it remains invariant —within an affine transformation— under con¬ 
volutions. More precisely, choose n and let Xi,...,A'n follow this distribution; it will be stable iff 3an^bn : 
Xi + ... + Xn = anX + A p-Stablc variable V is represented through its characteristic function, as no closed 
form for its distribution function is generally known: 

E[exp(i6>'F)] = exp(—+ i/aO), 0 < p <2 

It is customary to write that V ^ SpS{a). Closed forms for the distribution are known in the cases where p = 1 
(Cauchy) and p = 2 (Normal). The tails of the distributions, however, are known to follow power laws: 

lim XPP{V > A) = lim A^P(y < -A) = CpCjP 

A^oo A^oo 

A self-similar p-Stable random process with stationary increments and self-similarity factor H is usually denoted 
by Lh,p (see the definition of self-similarity in section EI). The stationary increments requirement, combined with 
self-similarity, imposes that: 

LH,p{t) - Lh,p{s) ~ SpS{\t - s\^(7v{l)) 
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Figure 10: Multiple packet emission according to TCP’s Slow Start, as observed in 4 connections of trace 94 and 
2 connections of trace M6 (remember here that a connection is the subset of the traffic corresponding to specific 
values of sender and receiver host and port numbers): in all cases except (c) Slow Start behaves exactly according 
to the TCP specifications! Slow Start may not always be as “pure” as in figures (a), (b), and (d), though; one also 
sees cases like (c), in which the increase of packet number is present but does not fit the protocol specifications as 
nicely. Nevertheless, our experimental observations correspond to our theoretical predictions. 
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The increments will also be independent iff = 1/p, in which case the process is called a Levy Stable Motion. 

An important difference between p-Stable and Gaussian processes is that, in general, the paths of a p-Stable 
process with p < 2 are discontinuous with probability 1. Notice also the larger number of parameters needed to 
specify a p-Stable variable completely, compared to a Gaussian variable (mean, variance, and p vs. mean and 
variance): it will become clear after the statement and proof of Theorem C, and the discussion afterwards, that 
estimating p from a given data set is by no means easy, but, fortunately, for this paper’s purposes, its value will 
not be needed. 


5.3 Statement and proof of Theorem C 

Before the formal statement of Theorem C, a brief description of the notation is in order: the stream of data 
corresponding to the i-th user, with Max = M, will be T^{t) will denote the total aggregated traffic until 

time t. The function C{s,t) is the covariance structure of the Gaussian limit process, as given in (|T^ . Finally, p 
is the exponent in 0. 


Defining process C. Let and be the same as {O^(0}~i,{^L}S=o and 

{I - 2M + 1)+ 


Given M > 0, let 0(/) = 


log2(/ A M) + 


M 


+ 1, where [X\ stands for the largest integer which 

is smaller than X and (X)+ stands for the positive part of X, as usual. Let ^ (>)= E Ri y Recall also 

i=i 

the definition of the “delta” function bS^^s) and its properties: bS^^s) = 0 if x ^ s and / b6x{s)ds = b for all 

J x — h 

h > 0. With this convention, define: 


k-l 




^ (2' A M)Mt) = (0}(i) + 0^(i)) + O^(fc) + ^ I < <i>{L%k)) 


i=l 


i=l 


Otherwise 


As before, define further W^{t) to be a stationary version of W^{t). Finally, let be i.i.d. versions of 

define a constant EbF^(x) such that 


E 


/ W^{x}dx\ = EW^\b-a\, 


and let 


C{s, t) = Jim T (e {x)dx {x)dx'^ - E {x)dx'^ E {x)dx'^ ) (15) 

where = Var ^ J • Recall also that 3p e (1,2) : (I/°(i) > t) = 0(1), as t —> oo, which is 

another way to say that the LHS of the expression is bounded above and below by a constant. 


Theorem C. 


pt n 

LetT^{t)= / ^WlM{s)ds. Then 

Jo 


i) If M = Mn R oo and M^ < for q < 1/p, then, for fixed t: 


Kit) = 


1 


{T^^{t) - ET^{t))^ G%t), te[Q,T] 


iVn 
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(f 


where cr^ = Var [ j Wi^Mr,{s)ds ) and G^{t) is a eentered Gaussian proeess with eovarianee strueture 
G{s,t). 

a) If Mn > for q> 1/p, then, for fixed t: 

Znit) = ^ K(t) - E (T„^(t))] t e [0, T] 

where P^{t) is a p-Stable proeess. 

Note: we will not pursue further the issues of what the exact value of p for P^{t) is and how it can be estimated 
from a given data set (see the relevant remarks in section 1^^ . 


Proof. First of all, recall the equality 


E 


POO 

{\X\)= P{X > t)dt 
Jo 


( 16 ) 


which will prove repeatedly useful later. 

For part i), finite dimensional convergence has to be established first. An easy computation yields 


Var 


Var 


(/o wr^„{s)ds) 

wrjs)ds) 


7(2 for 0 < 7(2 < 1. 


P - 1 

Using the notation Wf ^ Xj^= / Wf^{x)dx, and Xi^n =- {X/^ — E (V^^^)), it is sufficient to show 

’ ’ ’ Jo ’ ’ ^t,n 


that 


n ^ n 

^ [Nn - E (y,n)] ^ A^(0, 1) 


i=l 


Since the Xi^n are obviously i.i.d., centered and with Var = 1, the above follows easily (using a classical 

Lindeberg argument, see EH), provided that 


-1/2^ 


X;. 


0 . 


Clearly E (V*ra) ^ C < oo, hence the above is bounded (up to a constant) by 


E(|V‘„f) ^ M„e(|X‘„|") 



< c 


M, 


.p/2 


< C'n^ = 


where in (*) we used 1161 and 0. 

The stochastic equicontinuity follows after a computation very similar to the one presented in the proof of 
Theorem B. 

Namely, one needs to estimate 


/ n 

E (Z^(t) - = E ^ ((X*„ - XU - E (X‘„ - X^J) 

\o-nVn 

< {{Xl^ - XU - E {Xl^ - XUT + 4 (e {{Pn 

nCJn CFn ^ 

< ^4 (e (y ,n - + 6E - Xl^f (E (X* 

nCJn \ 


- XU - E {Xl 



(e {xi^-xuy 
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It can be shown, using m that, for S small enough and 5 = 1/p — q, 3C > 0: 




< C\t — s\ 


and 


E - X!P' 


< Cn ^\t — s\ 


Therefore, for \t — s\ > n 3Ci > 0: 


( 17 ) 


As in the proof for Theorem B, this allows for the replacement of the standard requirement for stochastic equicon- 
tinuity 0 with: 


limlim supP 

S ^0 ri .—^oo 


sup |Z-(t)-Z-(s)|>e =0 

<'yn~^/^<\t — s\<d j 


In order to prove the above, one can proceed as in the proof of Theorem B: construct the partition = \ti = 


- — : i = 0,..., K^yfnT^ where = 5 ^8EIT^, and, for every t, define functions (\)\ and 02 • ^ such 

Ks^/n 

that 01 (t) = tj <t < tj-^i = 02(0 for some j < K^^ynT. 

Clearly 

which implies (by essentially the same computation as the one presented in Il'IOIl i 

z-„{t) - z-^{s) < Zl{r 2 {t)) - Z-^irAs)) + e/4 

/ 1 n \ /z: 

Observe now that E(T„^(((.^(t)))-E(T„^(<^"(t))) = E -^ ^ / WlA^dx ) = ^(0^(t)-0^(t))EW^^„ < 

ycTfiVn 

^W^mJKs = e/s. 

This proves part 0 of Theorem C. 




Proof for ii). Let us begin by giving a summary of some relevant convergence results found in m , section 2.6 
(see also section lOl above): 

Lemma D. Let {Xi} he a sequenee of i.i.d. random variables, and suppose there exist two sequenees {on} and 
{bn} sueh that — ——- — bn Z (in whieh ease we say that Xi is in the domain of attraction of Z). 

On 

Then: 

• Z ean only be a p-Stable r.v., where D <p <2. 

• 3(7 > 0 : Un = Cup . 

• p = 2 ^ 'E{Xf) < oc, or, equivalently, p <2 ^ E(Xi) = cx). 

• If p = 2, Z is Gaussian. 

For fixed t, let Yf = / Wf^{x)dx, where stands for with Mn = oo. A straightforward calculation 

00 ’ 

of CiYf) shows that E[(T/)^] = oo; it follows then from Lemma D that Y/ is in the domain of attraction of a 
p-Stable random variable, i.e. 


T-l/P 


y](y/-E(y/)) ^p-Stable 


r.v. 


i=l 


It is also a classical result (see I2ni) that, if Y/ is replaced with its truncated version Y/^^ = the 

above convergence still holds. The result now follows since the tail behavior of Y/^ is the same as for X}^ = 


/ 


{x)dx. 


□ 
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5.4 An interesting prediction: oscillation between Gaussianity and p-Stability 

According to Theorem C, and contingent upon the relation between n and the maximum window size M^, the limit 
process can be either Gaussian or p-Stable. Notice, however, that this condition does not depend on the length T 
of the time interval under consideration. Therefore, it can be conjectured that the result applies “locally”. This, 
in turn, suggests two possible mechanisms causing oscillations between Gaussianity and p-Stability. 

On the one hand, although formally n users are active in [0, T], their actual number may fluctuate, if examined 
on smaller time intervals (Fig. IT^ . Let then (t) be the number of users, among the total n users of the 
simulation, who are in an ON-interval at time t. If {Nn{t)y < M^, the local behavior around t will tend to be 
p-Stable, if {Nn{t)y > M^, it will tend to be Gaussian. This oscillation is observed in practice (Fig. IT^ . and mere 
visual inspection reveals that the number of users and the Gaussianity of the process are well correlated (Table CJ. 
The fluctuation of the number of active users in time can apparently be well modeled by model A. Finally, notice 
that the number of bins used to compute the marginals is not responsible for this oscillation, as it is kept fixed 
throughout the computation. 

On the other hand, itself can change, too. This happens actually in real networks built on TGP/IP, where, 
according to the protocol specifications, packet loss halves the window size. In other words, according to Theorem 
G, the more congested the network is, the more Gaussian the traffic should look like. This has indeed been confirmed 

ESI El- 

As a consequence, if incorporation of packet losses into this model is required, can be allowed to be different 
for each user, and randomly selected, but constant. 

5.5 Is oscillation between Gaussianity and p-Stability a reality? 

Simulations of model G seem to be very successful in resembling the true traffic visually, and in capturing its 
long range dependence (see Fig. nsiiii and EJ. To test whether the other prediction of Theorem G, namely the 
oscillation between Gaussianity and p-Stability, is indeed seen in real traffic, the following experiment was designed. 
For simplicity, the description will be based on the 94 trace, when it comes to choosing values of parameters, but 
it can be easily translated into the context of newer traces, using the observation that transmission appears to be 
100 times faster. 

Start by binning the trace using a bin of 0.1s. Since under a time bin of 0.1ms it is mostly true that at most 
one packet falls into a particular time bin, the 0.1ms bin size can be considered a “natural” time bin for the 
trace; taking a marginal distribution of the trace at this bin size would yield the marginal of the packet size itself. 
Gonsequently, the value of every bin is the sum of 1000 “natural” bins, i.e. the sum of 1000 identically distributed 
random variables, and actually distributed as the packet size, if 0 is allowed to be a legitimate packet size. One 
would hope that this number is sufficiently large to yield a reasonably accurate convergence to the Gaussian limit 
predicted by the GLT, if convergence to a Gaussian r.v. were to hold, or yield results that are observably at least 
non-Gaussian, if convergence to a p-Stable limit were to hold. Actual convergence to the p-Stable limit is too much 
to ask for, because a) a different normalization is required for that, compared to the Gaussian limit, and b) infinite 
variance is impossible to capture with a finite trace, so that a p-Stable variable cannot be distinguished from a 
large variance variable, in practice (see Fig. irTT e ) and (f)). 

In order to examine the trace behavior locally, we divide the trace using consecutive, equisized, non-overlapping 
windows and take the normalized marginal (corresponding to mean 0 and variance 1) within each window. We 
then compute the Kolmogorov distance (see Q) between this marginal and a normal ^"(0,1) distribution, for each 
window; graphing this Kolmogorov distance as a function of the bin label illustrates changes in time of the local 
behavior. As in the case of the windowed Fourier transform, the window size is the result of a trade-off: small 
windows will yield well localized inaccurate results, whereas large windows will yield poorly localized accurate 
results. A size of 512 bins seemed a good choice, thus ending up with approximately 140 points in the graph (the 
trace consists of approximately 72000 bins). The resulting Kolmogorov distance graphs for the data sets are shown 
in Fig. m a), (c), and (d). In some cases, the Kolmogorov distance is small throughout (M6, M7, L4); in others it 
oscillates violently. 

In order to calibrate these results, we performed the same computations on realizations of known distributions; 
the results are shown in Fig. irTT ab (b), and (c). 

In Fig. ITlT al we compare a realization of the normal distribution with the theoretical curve; as expected, we 
find a very small Kolmogorov distance Based on this, we place the threshold of “not differing fro Gaussianity” 
at dK ~ 0.08. 
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Kolmogorov Distance between two Normal distributions Kolmogorov Distance between a Normal and a heavy tailed distribution 



(a) 


(b) 


Kolmogorov Distance between Normal and Weibull distributions Trace 97 Marginal vs. Gaussian 



(c) (d) 


Paretto vs. Gaussian 


Weibull vs. Gaussian 




Figure 11: Kolmogorov Distances of normalized (a) Gaussian, (b) heavy-tailed, and (c) Weibull marginals from the 
N(0,1) distribution. Marginals have been computed using non-overlapping consecutive windows of 2^ = 512 bins. 
The formula used for the Weibull is: F{x) = 1 — . The heavy tailed distribution corresponds to the pure 

power law (Paretto) Y = X ~ 1/(0,1). In both cases, a does not affect the results, due to the normalization, 

whereas the values for b used are shown in the legends, (d) shows that the trace 97 marginal is very close to a 
Gaussian, although its tails are Weibull, as seen in Section im Finally, (e) and (f) show the distance between a 
Gaussian and a Paretto (with b = 0.8), and between a^&aussian and a Weibull (one with 5=1., and one with 
b = 0.5), respectively. Notice the similarity between the Weibull with b = 0.5 and the Paretto. 



























































































Cross-correlation between Kolmogorov distances and traffic: 


Trace 

94 

97 

L4 

L5 

M6 

M7 

U8 

89 

Correlation 

-0.300 

0.392 

-0.161 

-0.244 

0.031 

-0.048 

-0.787 

-0.806 


Table 1: Cross-correlation between Kolmogorov distances of windowed empirical marginals from A/'(0,1) and the 
total traffic within the window, for the real traces. 


In Fig. irTT b) one can see the results for the three heavy-tailed distributions of K = ax where b = 0.5, 0.7, 0.9 
respectively; in these cases the oscillations of the local Kolmogorov distance to Normal have a larger amplitude (the 
difference between minimum and maximum is about 0.12 in all three cases, which is between 25 and 35% of the 
mean Kolmogorov distance), but they all are solidly above 0.2, and move further away from Normal as b increases, 
as is to be expected. 

Fig. CHc) shows the local Kolmogorov distance to Normal for four different Weibull distributions, with cu¬ 
mulative distributions F{x) = 1 — . For 6 = 2, the graph is in the Gaussian range, as could be expected; 

as 6 decreases, one moves away from the Gaussian range, and the oscillations in the local behavior become more 
marked; nevertheless, their amplitude never exceeds 35% of the mean. For 6 < 1 they remain well segregated from 
the Gaussian reange at all times. 

We can now evaluate the results for the data sets based on these calibrations. The data sets M6 and M7 look 
solidly Gaussian, as does the Slow Start simulation with Max = 8; data 97 and simulations with Max = 16 
dart between Gaussianiy and the region dx > 0.1, with a total oscillation amplitude approaching 60% of their 
mean; data sets 94, L4 and L5, and simulations with Max = 32 reach much larger but occasionally reach into 
Gaussianity, and their total oscillation amplitude exceeds their mean; finally, trace U8 is completely in the “far from 
Gaussianity” region dx >0.2, with again maximum oscillation exceeding its mean. However, what distinguishes 
these traces is that they reach local Kolmogorov distances typical of distributions with large variance, yet that, 
unlike e.g. the large variance Weibull distributions, they have very large amplitude that occasionally take them 
back to or close to Gaussianity. For Max in a “critical” range {Max = 16 for our parameters), our simulations 
have the same character. This suggests that the data, as well as the simulation, exhibit an oscillation between 
Gaussian and far-from-Gaussian behavior, consistent with Theorem G. 

5.6 Traffic amount is correlated with marginals locally 

An experiment related to the previous one is the following: compute the Kolmogorov distances of the traces, as 
before, and also calculate the total traffic within each window. This procedure generates two new sequences. Table 
fi] gives their correlation for our different data sets, for 0.128s-binned traffic. Interestingly, it is non-negligible and 
negative for all the data sets that are clearly not uniformly Gaussian according to the discussion above (i.e. 94, 
L4, L5, U8, and 89). 

This is again consistent with Theorem G, by the following argument: Theorem G states that the more numerous 
the active users are, the more Gaussian the traffic will be, and thus the smaller the Kolmogorov distance. On the 
other hand, the number of active users is highly and positively correlated with the traffic volume. The combination 
of the two statements implies that traffic volume and the Kolmogorov distances should be negatively correlated, 
unless, of course, the traffic is already (almost) Gaussian, in which case no prediction according to the theorems 
is possible (the data sets close to Gaussianity are traces 97, M6, and M7, whose correlations are indeed either 
negligible or positive). Note that Gaussian-type traffic is not extremely spiky. The negative correlations in Table 
fi]thus illustrate that when more users are active, and when traffic is heavier, it becomes less spiky. This seems 
counterintuitive; yet it is consistent with Theorem G, and borne out by the data. 

Note here that the correlation can vary dramatically with the time bin used, due to the presence of “levels”, 
which will be discussed in Section IHl But, in general, for large time bins, the correlations are large, as Theorem G 
predicts. 
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5.7 A comment on Slow Start and heavy-tailed session sizes 

Note that the importance of the impact of Slow Start on the traffic is intimately coupled to the fact that the 
distribution of L is heavy-tailed. If L is always very large, the window size will reach M and will stay there. Thus, 
with the exception of the first few times, the user will always be sending M packets at a time. This situation is 
little, if at all, different from model B. If, on the contrary, L is always very small, then the effective M will be very 
small, since Slow Start will be stopping at a very early stage, the session being over. 

Consequently, Slow Start can be the star only when most of the sessions are small, but occasionally huge ones 
show up. It is precisely these latter ones that form spikes, taking advantage of a large value of M. Rephrasing in 
network terminology, “spikes are created by large sessions over fast links” Statistical analysis of the flows of 
e.g. trace 94 reveals that they are indeed very short, with occasional large ones. 

The assumption made implicitly in the discussion above is that a spike is the result of one user, not the aggregate 
result of many of them. Although this has already been confirmed an independent confirmation is offered 
here: divide the time axis into bins of constant size, and in each bin measure the number of active connections 
(i.e. connections that send in this interval more than one packet), and the average window size (i.e. the number 
of packets sent divided by the number of active connections). If, for a spiky traffic, it is the latter that is spiky, 
and not the former, then this indicates that some users must be sending a lot of packets. Fig. IT^ shows that this 
is indeed the case for real traffic. 

5.8 Earlier work on the Gaussianity and p-Stability of the traffic 

The idea of a process that can be either self-similar of p-Stable, contingent upon the value of a parameter, already 
appears in the form of theorems in I2ni, and of observations in real world traces in I2ni. More precisely: 

• m analyzed empirically the oscillation of the traffic between Gaussianity and non-Gaussianity and offered 
a plausible networking explanation. No attempts were made, however, to explain it mathematically. Conse¬ 
quently, the authors were unable to justify the observed negative correlation between amount of traffic and 
spikiness. 

• m provides an interesting variant of model A, the ISPM, and considers a process equivalent to Zf^{Tt)^ 
according to the notation of section 01 It proves then that the limit process, as T ^ oo and n ^ oc 
simultaneously, can be either p-Stable or Gaussian, depending on the rates by which n and T tend to oo. 
In our opinion, though, the limit process does not relate to Internet traffic in an obvious way, and its 
mathematical predictions, at least as stated in cannot be directly translated in the language of networks, 
except yielding a result for the traffic at coarse scales on very busy links, which is exactly the situation the 
two limits above describe. Moreover, no attempt was made in m to corroborate the model’s assumptions by 
empirical evidence, or to compare the properties of the simulated traffic it produces with the corresponding 
properties of real traffic. Finally, the limit processes in m are strictly either self-similar or p-Stable; no 
allusion is made to the possibility of oscillation of the process between these two extremes. 

6 Model D 

6.1 Introduction 

Models B and C were able to capture and explain crucial features of Internet traffic, such as its long range 
dependence, its different behavior in different time scales, and its spikiness (compare Fig. ^ [2j and |H|). 
Unfortunately, the Energy function of the simulation is still far from reality (compare Fig. [TKIandlTj). This suggests 
that pieces of the puzzle are still missing. 

6.2 A closer look into Averaging functions 

So far, the simulations failed to introduce a substantial curvature into the Averaging or Energy function (Eig. 0 
ira . although the processes used spanned a variety of attributes (heavy/light-tailed marginals, independence/long- 
range dependence, etc.), and even combined them (as in the a-(3 traffic model). It was precisely this characteristic 
that stimulated research of “multifractality”, questioned the accuracy of traffic description through Poisson or 
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self-similar processes, and ended up demonstrating that the traffic is a fundamentally different process from these 

two Esmnmni 

The Averaging functions of the data traces (Fig. are truly remarkable: recall that a slope a at time scale 
log2(n) indicates a behavior of type n~^ for the difference between two mean value estimators of n terms; if a 
depends on n, the process is not self-similar. This is the case for all of the traces. Moreover, as explained in the 
heuristic argument in section [2| one expects, if the Xi are asymptotically independent, that the Averaging function 
will decrease monotonically as a function of n, at least for large n. This is not the case for our data. For instance, 
the Averaging function of trace 94 is constant in the region from n = 2^ = 256 till n = 2^^ = 32768. In other traces 
the Averaging function even increases with n: trace 97 exhibits a little “bump” at time scale 13 (Fig. IThT aA. and 
trace M6 exhibits a bigger bump at time scale 18 (Fig. ITHT cA: simulations carried out later in this section achieve 
an even more spectacular result (Fig. (23 • 

In our heuristic argument we assumed asymptotical independence of the Xi. We now know that the Xi in our 
traces cannot be considered asymptotically independent, at least relatively to the trace duration, and we interpret 
the “deviant behavior” of the Averaging function as a manifestation of this lack of independence. Why should it 
show up much more at some scales than in others? This is the question we address in this section. 

6.3 Averaging functions and multiscale structure of the traffic — levels 

There are essentially two important aspects of Internet traffic that have been overlooked throughout the discussion 
of the previous models: 

On the one hand, the exact attributes of time intervals between packet emissions were hitherto supposed to be of 
little importance: in the theorems, they were assumed to follow any distribution of finite variance, and simulations 
were performed with an exponential one. The exact distribution has indeed been an object of study, and it turned 
out that the Poisson model for a single user fails Q], and that (bi)Pareto, Weibull, and empirical (Tcplib Q]) 
distributions come into play [Hj. The former may violate the finite variance assumption. 

On the other hand, traffic seems to comprise many phenomena that are hierarchically structured: for example, 
users organize texts in chapters, chapters in paragraphs, paragraphs in sentences etc., thus one expects a variety 
of durations of pauses and activity. This will have an effect on traffic, if one supposes the user types over the net. 
Also, during Web browsing, users read, process the information, and act accordingly. Finally, in the network there 
is protocol activity spanning many time scales: congestion control occurs every fraction of a second, but routing 
information is exchanged every half an hour, or so. 

It appears then that many time scales are of importance, each for a different reason, but all because some kind 
of activity is taking place there. The exact values of these time scales, or other details about them, are still unclear, 
to the best of our knowledge, although they are generally accepted to exist BElIiHI, as the result of both user 
behavior and protocols. This activity in multiple time scales, however, can alter completely the autocorrelation of 
the traffic, introducing strong correlation in some time scales and weak in others. This, in turn, will impact the 
Averaging function as well, according to ©• Consequently, the curving of the Averaging function may have its 
roots in this multiscale activity, and this seems worth investigating. 

6.4 Evidence for multiscale activity 

A useful and simple tool for the detection of these time scales (levels) in the traffic is the characteristic function of 
a connection^. A connection is defined to consist of the entries of the trace corresponding to a particular value of 
sender host, receiver host, sender port, and receiver port, and is essentially what the Wi stand for in Theorems B 
and C. The characteristic function of a connection is a function of time, equal to 1 if that particular time is a time 
stamp belonging to the connection, and 0 otherwise. 

Characteristic functions seem to have a fractal structure (see Fig. rm and CHI); moreover, analysis of the 
characteristic functions of the 50 largest connections of trace 94, and connections of trace M6, leads to the conclusion 
that levels of connections do not average out in the aggregate traffic, but, on the contrary, have a strong influence 
on it. Capturing this effect will be the main goal of this section; but first, let us propose a more quantitative level 
detecting method in individual user sessions, which establishes the presence of multiscale levels more firmly than 
the anecdotal evidence of Fig. ITtI and ITHI 

^Similar diagrams exist in 
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6.5 Detecting levels quantitatively in user sessions: the Interval Detection Algorithm 

(IDA) 

The purpose of this experiment is to provide quantitative information on the levels present in a session. The tool 
developed herein bears many similarities to Tools 1 and 2 of section |H1 providing similar results, but by following a 
different (and much simpler) method, based on the idea of detecting the lengths of periods of activity and inactivity. 
We start by describing the Interval Detection Algorithm; we will come back to its motivation below. 

Take a user session and bin it using an initial time bin, but in a special way: omitting packet sizes, assign a 
value of 1 to every bin in which a packet emission took place, and 0 otherwise. Subsequently, choose a base 6 , and 
run on the session the Interval Detection Algorithm: 

1 . Part 1: 

(a) Record the lengths of all intervals of value 0 {gaps) in the data set. 

(b) For every i > 0, find the sum s{i) of the lengths of gaps G that satisfy [log^dGj)] = i. 

2 . Part 2 : 

For alH > 0 such that i < [log^(Session Length)] + 1 

(a) Invert the session bin values logically. 

(b) Run Part 1. 

(c) Store the result as the ith column of the array a(j, i) (we will call these results the ith stage of the 
detection of 1-intervals of levels or simply the ith stage of the IDA). 

(d) Invert the session bin values logically. 

(e) Assign value 1 to all bins of the gaps G that satisfy [log^dGD] < i. 

(f) Set w{i) to be the session values sum. 

3. Part 3: Choose 0 < 7 < 1. Then, for all j > 0 

(a) Compute M = max^(a(jf, i)). 

(b) Set i equal to the maximum possible value, and start decreasing it, until a(jf, i) < 7 M, but a(jf, i — 1) > 
7 M. Set / = i. 

(c) For all i, set a(jf, i) ^ a(j, i)/u;(/). 

(d) Set s{i) ^ s{i)/w{I). 

4. Part 4 (Representation of the results): 

(a) Let im be an array. Choose numbers ci > C 2 . Then, for all i > 0: 

i. Find the maximum of the ith column of a, mi. 

ii. Find the second maximum of the ith column of a, m 2 . 

hi. If mi > cim 2 and m 2 > C 2 mi then Mj : a(jf, i) ^ a(j, i)/m 2 else Mj : a(j, i) ^ min(a(jf, i)/m 2 ,1). 

(b) Repeat the three steps above for s {optional). 

(c) Vi, j, let im{jG) = a{jG)- 

(d) Append to im a column of Os {optional). 

(e) Append s to im as a column. 

(f) Mesh im^ in greyscale: the higher a value of im is, the darker the point representing it (this will be the 
greyscale representation). 

(g) Set vi to be a vector, whose ith element is the sum of the elements of the ith row of im, excepting its 
last column (the 0 -interval stage). 

(h) Set vq to be the last column of im. 
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(i) Normalize vi and vq so that they both have maxixmum value equal to 1 and plot them (this will be the 
sum representation). 

Within the framework of the greyscale representation, the IDA will indicate that a level for the 0-intervals exists 
where <s(') has a high value, whereas a level for the 1-intervals will exist where local maxima of a(-, i) align for many 
different i. On the other hand, within the framework of the sum representation, levels for the 1- and the 0-intervals 
will be assumed to exist at the local maxima of vi and vq respectively. 

Notice that the two representations are not equivalent, since the sum representation does not show the individual 
stages of the 1-interval detection. However, as far as level detection is concerned, they can be considered equivalent: 
if many stage maxima cluster at a particular time scale, then the sum of the stage values at this particular time 
scale will tend to be rather large, and vice versa. Therefore, the two representations can be used interchangeably 
for our purposes. 

To explain how the IDA gives us this information, we first describe our way to simulate multi-scale sessions 
similar to the real sessions appearing in Fig. ITTIandlTHl and see how the IDA allows us to observe their structure. 
The following is our simulation strategy: 

1. Generate a vector of isolated Is separated by intervals of Os; these intervals represent the RTTs and should 
be independent and identically distributed. 

2. Generate a vector of alternating intervals of value 1 and 0. These intervals should also be identically and 
independently distributed, but also cross-independent. The two generating distributions, though (for intervals 
of value 1 an 0), need not be the same. 

3. Repeat the previous step as many times as necessary, by choosing distributions with progressively smaller 
mean. Simulated traffic with n levels will result, if the step is repeated n times. 

4. Multiply the vectors obtained above. 

This strategy contains the main idea behind levels, that coarse 1-intervals get recursively subdivided into finer and 
finer 1- and 0-intervals. 

In order then to detect the structure of a user session (whether simulated or real), it will suffice to parse this 
user session only once in order to find the (unnormalized) histogram of the 0-interval lengths, which is exactly 
what Part 1 of the IDA does. The detection of 1-interval lengths will be trickier, however, since such intervals are 
not immediately present in the session: the 0-intervals of finer levels fragment them into small pieces, and their 
continuity has to be restored before they can be detected. Part 2 takes care of this by computing sequentially the 
histogram of the 1-interval lengths, after “filling in” larger and larger 0-intervals. 

Finally, the IDA deals with the normalization of the histograms it produces. Making a traditional histogram 
by counting how many interval lengths can be logarithmically rounded to a certain time scale, and turning the 
histogram into a probability density by dividing with the total number of intervals that contributed to it will not 
do in this case, since the coarser the intervals, the more rare they are, but this does not mean they become less 
important, as this normalisation would suggest. The normalization must reveal the strength of the presence of 
a level within the session. One way to do this is to add the lengths of all intervals that contribute to a certain 
entry of the histogram and divide by the session length, which is equivalent to finishing the IDA in Part 2. But 
this will bias in favor of the coarse levels a lot, since a level does not really live on the whole session, but on 
the immediately coarser level only. Thus, the best normalization will be obtained by dividing not by the session 
length, but by the sum of 1-interval lengths of the immediately coarser level, on which the level in question resides. 
An easy way to determine this normalization factor is to observe when the values of the unnormalized histograms 
corresponding to the time scale in question become negligible: if, after filling the gaps with length approximately 
equal to 6-^, the histogram values corresponding to time scale ¥ become negligible, the proper normalization is 
yk,a{i,k) ^ a{i,k)/w{j) and s{i) ^ s{i)/w{j)^ as Part 3 explains. This method is approximate, of course: its 
main drawback is that the normalization depends entirely on the 1-intervals. However, it seems to work in practice, 
as we are about to see. 

The normalization issue discussed above suggests that the location of the maxima may be misleading, if one 
attempts to detect the 1-intervals of the levels based on them: indeed, despite the improved normalization discussed 
above, our simulations show that the maxima of the different stages of the detection of 1-intervals still span a wide 
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numerical range. This is why we decided to suppress this information in the final presentation of the results of the 
IDA, and use other information instead, as suggested in Part 4 of the IDA algorithm. 

As the algorithm evolves and gaps of the data set get filled, the maxima of the stages will be moving towards 
larger scales. In the case of simulations with clearly separated and defined levels, this motion will be discontinuous: 
if the 1-intervals of a level have approximate lengths of 6* and the ones of the immediately coarser level ¥, then for 
stages immediately before i the maximum will be at i, but for stages i up to j it will be at j: indeed, filling gaps of 
lengths between 6* and will be of no effect, because the number of these gaps, by assumption, is not significant. 
The IDA then produces results such as in Fig. ITW ab (b), where the sudden jumps and the long immobility periods 
of the maxima are very eloquently depicted (more information about Fig. ^|will follow shortly). 

In the case of simulations with diffused levels or of real data sets, though, the maximum will be moving more 
or less smoothly. When depicting the results of the IDA in the greyscale representation, this motion will appear as 
a diagonal line of maxima in the figure (see e.g. Fig. EOJa), (c), andEOJa)). However, there will still be time scales 
where maximum (or nearly maximum) values will seem to persist for more than one stages (such as time scales 12 
and 18 in Fig. [2(11 ab (b)), just as in the case of non-diffused levels, indicating the existence of a level there. Thus, 
the algorithm will not rely upon the maximal values of the stages for the detection of 1-intervals, but will instead 
pick the time scales where large values of the stages tend to “cluster”, as Part 4 of the IDA description states. 

In order to test the performance of the IDA, we applied it on six simulated sessions, generated by the session 
simulation algorithm presented above; the results are shown in Fig. and [201 Each one of the six simulations, 
which have a behavior very similar to Fig. ITTI and IT^ features 3 levels, whose 1-intervals have mean size 2^, 2^^, 
and 2^^, and whose 0-intervals have mean size 2^, 2^, and 2^^, i.e. they are 16 times smaller. The interval lengths 
are uniformly distributed around their mean in Fig. [HD and Fig. I^ ab (b), and the ratio of the allowed variation 
over the mean is 0 in Fig. [mTab (b), 0.3 in Fig. (TUrcb (d), 0.6 in Fig. [mTeb (f), and 0.9 in Fig. I^ab (b). 
The levels in Fig. (OOfc), (d) are exponentially distributed around their mean, and levels in Fig. I^ eb (f) follow a 
Gaussian distribution, whose mean in each case coincides with the level mean, and whose variance is proportional 
to the mean: = 4/i. The IDA picks out the positions of the 0- and 1-intervals reasonably accurately, even when 

the variance becomes large: for scales (horizontal axis) at which 1-intervals of levels exist, the greyscale column is 
much “taller” than at other scales; the scales at which 0-intervals of levels exist are picked out by the top greyscale 
line. In particular, it is always clear, except in the exponential case in Fig. EUc), (d), that there are 3 distinct 
levels. 

The Averaging functions corresponding to two of the above simulations (fTTH bj andjSIJa)) are shown in Fig. 

At this point, we would like to jumb ahead a bit, and ask the reader to notice the clear correspondence between the 
way the Averaging function curves, and the position of the levels, as captured by the IDA. Indeed, the following 
sections will demostrate the relation between these two phenomena. 

Having thus assured ourselves of the efficiency of the IDA on simulated sessions, we use it in Fig. 1^ andiron 
two individual sessions taken from the data sets. In each case, the IDA identifies time scales that stand out. In 
particular, regarding the longest session of trace 94 (Fig. 1^ . the IDA predicts that the 1-intervals of levels have 
lengths between 2^^ms and 2^^ms, whereas the 0-interval lengths lie between 2^ms — 2^Ams and 2^^ms — 2^^ms, 
which means that this session appears to have levels all over the place. Jumbing ahead once more, let us note that 
the Averaging function seems to suggest the same thing (see the discussion in sections IbTfi and I6l9l for the connection 
between Averaging function curving and levels), as the position of the “bumps” of the Averaging function (Fig. 
|22{c)) corresponds exactly to the position of the levels predicted by the IDA. 

In two occasions above, a simulated and a real session, the IDA and the Averaging function where found to 
yield compatible results about where the 1- and 0-intervals of the session levels lie. Is then this always the case? 
Unfortunately, another real session, the one that was also used in Fig. El answers the question in the negative. 

For this session, the IDA detects 0-intervals of levels in two areas (see Fig. E^a), (b)): at 2^^ms — 2^^ms ^ 
2505 — 5005, and at 2^ms — 2^^ms ^ 0.255 — 165. In Fig. El we had identified by mere observation levels 
at 400, 15, 10, and 45 which fits very well. On the other hand, the IDA detects 1-intervals at 2^^m5 ~ 305 
and at 2^^m5 — 2‘^^ms ^ 2505 — 20005, whereas in Fig. El we observe 1-intervals with lengths of IOOO5 — 
2OOO5, 5005, and 405. Hence, visual inspection and the IDA yield almost identical results. But how does the 
IDA compare to the Averaging function (Fig. [2^ cU? Although the IDA detects a very prominent level with 
0-intervals around 2^^m5, the Averaging function seems to ignore it completely. Although this appears to be a 
potential failure for our model at first sight, it is not. We will come back to that issue in section I6l8l where we will 
discuss explicitly levels with unequal 1- and 0-intervals. 

As a general remark, compared to the simulations of Fig. El the behavior of real traces seems to be more 
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complex: actually, the percentage is almost nowhere equal to 0, which should mean that levels, even weak ones, 
exist in almost all time scales. 

Having established the effectiveness of the IDA on individual sessions, simulated or observed, let us now turn 
our attention to aggregate information: we apply the algorithm to each session in trace 94, and superpose the 
individual results. The result is shown in Fig. (note here that bin sizes increase geometrically by \/2 rather 
than by 2), which shows clearly that some levels persist in the whole trace: their 1-interval lengths lie between 
and 2^^ms, and also between 2^^ms and 2^^ms, whereas their 0-interval lengths lie between 2^ms and 2^^ms. 
Comparison with the Averaging function of trace 94 (see Fig. Elc)), which shows an extended period of activity 
between 2^ms and 2^^m5, and also a little “bump” at 2^^ms, demonstrates that the two methods are compatible, 
yielding similar results. 

The fact that the IDA detects levels in Fig. [^suggests that the majority of the connections of trace 94 share 
some levels, which appear then in the aggregate results, and that, consequently, it is possible to attribute levels 
to an entire data set, thus extending the notion of levels. The excellent agreement of the two methods (IDA and 
Averaging function — see also section corroborates strongly the legitimacy of the extension of the notion of 
levels to entire traces, and also implies that the much simpler Averaging function can be used for their detection, 
instead of the IDA. Indeed, on the same computer, the IDA needed two days to process trace 94 entirely, whereas 
the Averaging function only takes a few seconds. 


6.6 Theorem D: statement and proof 


Theorem D below implies that the covariance structure of total traffic is strongly dependent on its levels. 


Theorem D. Let A^{W) = Var 


a: 


W{t)d^ -Cow W{t)dt, J W{t)d1^^ andletWf{t) 


be 


the same as W^{t) but with = 0 a.s. and 0^{i) = oo a.s. Also, take Wf{t) to be the same as W^{t) (notice 

that represents a continuous traffic (^no spikes traffic^, while represents a discrete traffic, where RTT is 
present (^spikes trafffc^^. 


a) A‘^ (wf'j « A-i/2 ^ 0 as A ^ 00 

b) (Wi as A^O or A^oo and there exists a constant K > 0 such that 


(wt') >K for A = E(0^ + Oj)/2 


Proof. For a). Assume that the R^j are exponentially distributed with E(i?) = 7 > 0. Let Qk,A = /(fe^i)A Wi{t)dt, 
then clearly are i.i.d. Poisson with parameter A = A/y. Thus 

A^ = A“^y^Var (Qi,a) Cov (Qi,a, Q 2 ,a) = A~^l^ —> 0, as A ^ oo. 

The “exponentiallity” assumption on Pf makes the proof easier, since it implies Cov((5i,a, Q 2 ,a) = 0 and it 
simplifies the computation of Var((5i,A)- However, it can be shown that under much more general assumptions 
on the R, it is still true that |Cov((5i,a, Q 2 ,a)| < AiVar((3i^A) foi* some Ai < 1 and Var((5i,A) = A 2 A for some 
A 2 > 0. 


For b). The first step is to establish that A^ —> 0 as A ^ 00. m showed that 

Var((5i,A) ~ A^/^ and Cov((5i,a, Q 2 ,a) > 0 

large A 


and these two results imply that A^ ^ A^/^ ^—>0, as A—^ 00 . 

large A 


37 



For the case 
structure 


0, as A ^ 0, we observe that, for A small enough, Qk,A 


i 


kA 

(k-l)A 


Wi{t)dt has the following 


Qk,A 


0 with probability j9i(A) 
u with probability cA 
A with probability p2(A) 


(18) 


where u is a r.v. uniformly distributed on the interval [0, A], c = 2/(E(0^) + E(Oj)), pi(A) = (E(Oj) — 
A)/(E(0^) + E(Oj)), andp2(A) = (E(O^) — A)/E(F^O^) H-E(Oj)). In order to simplify the computation, assume 
that E(O^) = E(Oj) = 1. Then, 


E(Qi.a) = A/2, E((Qi,a)') = AV2 - AVs. 

It is clear, using the very same argument as in (tH^ . that, for small A and u a uniform r.v. on interval [0, A], 

{ 0 with probability 1/2 

A with probability 1/2 — 2A 

uA with probability 2A 

Thus 

E(Qi,aQ2.a) = AV2 - 2A3 + 2AV3 

Putting this together 

= A-yE(Qi,A)2-E(Qi,A,Q2.A) 

« A“V5AV3-2A5/3 « A^/^ ^ 0 as A0. 

small A small A 

Finally, the last statement of part b) has to be established. Here, the assumptions will be simplified considerably. 
Let On, Of he uniformly distributed on the interval [1,2] and let A = 1. Then, trivially 

{ 0 with probability 1/8 

u with probability 3/4 
1 with probability 1/8 

where u is a. uniformly distributed r.v. on the interval [0,1]. Also, since On < 2 a.s., Qi,aQ 2 ,a = 0 a.s. Therefore: 

= A-yE(Qi,A)2-E(Qi,A,Q2.A) = 3/8 

Clearly, On and Of have been chosen in order to simplify the computation. The extension to more general 
distributions for On and 0/ is much more involved, but nevertheless straightforward. Namely, one can show that 
under much more general assumptions it is still true |Cov(( 5 i,A 5 Q2 ,a)| < AiVar((5i,A) for some Ai < 1 and that 
Var((5i,A) = A2A^ for some A2 > 0 (provided that A E(On) )• 

□ 


6.7 Explanation of the curvature of the Averaging/Energy function 

The following argument explains the link between Theorem D and the curving of the Aver aging/Energy function. 
Namely, assume there is a level at time scale jo, and no other levels nearby. It will be assumed that 2-^° ^ RTT. 
Then: 

• For j < jo and 2^ RTT^ the ON-periods appear continuous, and can be modeled by no spikes traffic W^. 

Consequently, part b) suggests that the Averaging function increases. 

• For j > jo, the ON-periods again appear continuous, so part b) suggests that the Averaging function decreases, 
as j 00 . 
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Because the Averaging function is strictly positive, the points above suggest that it must have a maximum 
around the level jo. This has indeed been confirmed by simulations (Fig. I^^T bib 

However, the Averaging function of real traces initially decays, unlike the simulations in Fig. I2Stb). Part a) of 
Theorem D explains this behavior as well: 

• For jo < j and 2^ ^ RTT, the ON-periods appear discrete and can be modeled by spikes traffic W^. Part a) 
takes over and ensures White noise-like behavior, i.e. log2 Aj ^ —jl2.. 

In case more than one levels are present, the argument above can be repeated, the role of RTT being now 
played by the size of the previous (finer) level. Therefore, the Averaging function should initially decrease, then 
change slope around the level, and have a “bump” around every level after the first. Simulations indeed verify this 
statement (Fig. l2^ aH. The exact form of these “bumps” has to do with the “sharpness” of the level, i.e. with 
the variance of the durations of ON and OFF intervals. Very sharp levels, i.e. with small variance, seem to lead, 
at the time scale of the level, to the phenomenon of superconvergence^ ^ i.e. convergence to the mean value faster 
than what the CLT predicts, which manifests itself through a slope less than —0.5 (Fig. (23- 

We would like to emphasize that we have introduced levels in our model consistent with our wish for “local” 
models (in the sense that the sessions are constructed by putting together appropriate building blocks, rather than 
the “global” approach of e.g. Random Cascades.) Our inspiration for the model was taken from direct inspection 
of observed traces; it does however differ from our earlier step in that we cannot point to a particular feature in user 
behavior or network protocol that could cause them. The scales of the levels, their sharpness, and their possible 
evolution in time may well be of interest to network engineers. For this reason we propose tools, in section |H1 that 
are faster than the level-detector intoduced above. Note also that heavy-tailed distributions were not needed in 
this construction: long range dependence and levels thus appear to be completely separate characteristics. 

Now that we have understood the role played by levels, we are in position to explain the shape of the Energy and 
Averaging functions in the case of Slow Start simulations (Fig. IT^ . where the two functions are actually piecewise 
linear, not linear. The change of slope occurs between scales 7 and 9. But the mean value of the connection size 
was 5, and the mean RTT was 100, which implies that the mean total duration of the session was 500 ~ 2^, and 
this is exactly the level of the traffic. 

One could argue that if the simulations of Fig. [T7)l and 171 were repeated with different parameters, some sort of 
curving might be achieved. This is true, to some extent (Fig. 1^ . but the curvature achieved is slight and does 
not resemble the one of real traces. 

Finally, let us reexamine Fig. armed with this qualitative understanding of levels. Trace 94 seems to have 
several levels between the time scales 8 and 15, and maybe around 20. A comparison with our discussion of Fig. 

shows a good correspondence, although Fig. (Ogives a more detailed picture. Trace 97 has probably just one 
strong level at 13; trace L4 has one level at 8 and one at 15; trace L5 seems to have levels all over the place, but 
mainly one at some small scale (between 0 and 4), one at 8 and one at 15; traces M6, M7, and U8 appear to have 
levels at 12, 16, and 19, and M6 maybe has one more at 21; finally, trace 89 has levels at 2, 13, 16 and 19. In the 
next session, tools will be developed to quantify such guesswork. 

6.8 Levels with unequal 0- and 1-intervals 

Although Theorem D does not require the 0- and 1-intervals to follow the same distributions, the discussion in 
section IFTTI considered exclusively equally distributed 0- and 1-intervals, in order to enhance clarity. Unfortunately, 
as we have seen so far (for example, see Fig. ITTIandlT^ or Fig. [22| and [22|) , different distributions seem to be the 
rule rather than the exception, with the 1-intervals being typically much larger than the 0-intervals, so we need to 
determine how the Averaging function will respond to such a situation. Fig. EUa) shows the Averaging function of 
a simulated session containing just one level whose 1-intervals have lengths around 2^^, but whose 0-intervals have 
average lengths that are 1, 2, 4,... times smaller. Observe that the Averaging function appears to “lock” on the 
0-intervals, the existence of the larger 1-intervals being only alluded to by the size of the “bumps”, which are more 
prolonged than usual. Observe also (in Fig. m) that, if the lengths of the 0-intervals are much smaller than 
the lengths of the 1-intervals, and their distributions sufficiently diffused, it is possible for the bump to become 
negligible, and thus for the Averaging function to give the impression, by visual inspection, that no level exists. 

■^In real traces, we observed this behavior in the fine and middle scales of traces M6, M7, U8, and 89. See also 8S in Fig. El 
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An excellent example of a real session where the above observations apply is the session of Fig. ESI in section 
16.51 fremember the last remarks made therein): it appears there is a quite pronounced level with 0-intervals around 
but comparison with Fig. Elb) shows that the Averaging function does not seem to “feel” it. Does this 
suggest a potential failure of the model? Hardly so, in view of the discussion of the previous paragraph. Indeed, 
according to Fig. ini the 0-intervals of the session levels are much smaller than the corresponding 1-intervals, and 
therefore it is highly probable that the Averaging function responds to their presence very “moderately”, just as 
is the case in Fig. EZlb). But even such a moderate response can still be detected by means of suitably designed 
tools, if not by eye alone, such as the ones presented in section |H1 

6.9 Averaging function vs. IDA 

Up to this point, two completely independent methods have been developed for the detection of levels: the Averaging 
function and the IDA. Naturally, two questions need to be answered now: 

1. Are the results they give compatible? 

2. Which one is more efficient to use? 

The discussion in sections 16.51 and 16.71 implies that in principle the two methods yield equivalent results. In 
turn. Fig. EDI corroborates this statement: the session on the left seems to have very diffused levels that only 
slightly influence the curving of the Averaging function. However, the level at 19 has a strong impact on the 
Averaging function. The session on the right, on the contrary, seems to have sharp levels, as the correspondence of 
the two methods is almost 1-1. Indeed, the extended region of activity between the scales 5 and 14 corresponds to 
a region of intensive curving of the Averaging function. The fact that the latter extends in higher scales also can 
be attributed to the different distributions of 0- and 1-intervals, as explained in section IHIH Notice that the coarse 
levels above scale 17 impact the Averaging function strongly, as was the case for the previous session, too. 

Regarding efficiency, the Averaging function is far easier and faster to compute than to run the IDA on all of 
the trace sessions: the Averaging function can be computed in seconds, whereas the IDA typically requires days. 
The advantage of the IDA, however, is that it gives more detail, as it can distinguish the distributions of 0- and 
1-intervals. Because of its computational speed, only the Averaging function will be used in the sequel. 

7 The complete model 

It was mentioned earlier that spikiness and levels are two phenomena highly uncorrelated, and that, as such, they 
can be studied independently. Following this principle, spikiness was studied in section El and levels in section El 
each ignoring the existence of the other. A successful simulation, though, should contain both, so our model would 
not be complete without an algorithm that combines spikiness and levels in a simulated trace. 

The independence of the two features suggests that this combination can be done trivially; just use the algorithm 
of page IH5I but interject the following step between the original Step 1 and Step 2: 

1-2. Set the Slow Start maximum at M, and consider two indices, i and j. Set j ^ 1. Then: 

(a) Generate a heavy tailed integer random variable N. 

(b) Move i to the next 1 value of the vector (generated by Step 1 at pageEU- If end of vector found, then 
stop. 

(c) Set the vector value at i equal to j, set N ^ N — j, set j ^ min(2 * j, M). If A" < 0, go to Step (a) and 
set j ^ 1. 

(d) Go to Step (b). 

All this step does is to supply the level algorithm with a Slow Start background. The shortcoming of this 
construction, though, is that the simulated Slow Start sessions in the final simulated trace do not necessarily 
correspond to sessions that could have possibly been generated in practice; indeed, the later steps of the algorithm 
may “chop” the beginning or the middle of a Slow Start session, and obviously such behavior cannot exist in 
practice. So, at this point, our construction deviates slightly from the actual network mechanisms, but this does 
not affect the results at all. 
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It is possible, however, to change the algorithm slightly and remove these artifacts. Just go back to the algorithm 
of page ins and add the following step at the end this time: 

5. Set the Slow Start maximum at M, and consider two indices, i and j. Set j ^ 1. Choose a number of 
consecutive levels, starting with the finest one, and label them as “RTT levels”. Then: 

(a) Generate a heavy tailed integer random variable N. 

(b) Move i to the next 1 value of the vector (prdoduced at the end of the algorithm of page|SS|). If end of 
vector found, then stop. 

(c) If the gap between the previous and the current value of i is an RTT or belongs to an RTT level, generate 
a new heavy tailed integer random variable N and set j ^ 1. 

(d) Set the vector value at i equal to j, set ^ — j, set j ^ min(2 * j, M). If A^ < 0, go to Step (a) and 

set j ^ 1. 

(e) Go to Step (b). 

Here, the “RTT level” 0-intervals play the role of additional RTTs that a session may encounter. This additional 
step in the algorithm, then, essentially identifies whether a gap should be considered a RTT or a 0-interval: in the 
former case, the Slow Start session continues over it normally; in the latter, the current Slow Start session ends, 
and a new one starts. 


8 Discussion and Applications 

Correlation of user activity. It has been systematically assumed in the previous models that the user streams 
Wi{t) are independent. By adopting this assumption, one neglects effect of network topology on the traffic. 
Congestion is taken into account only to the extent that it randomizes the RTT and maybe decreases the maximum 
M of Slow Start, but this clearly does not introduce cross-correlation among the users. Let us look a little closer 
at this seemingly drastic limitation. 

The possible causes for interdependence between users are path sharing and server sharing. The former means 
that the data belonging to different users flow along network paths that share some links, or that exhibit correlation 
otherwise (ISP policies etc.). The latter means that two or more users access the same server, and perhaps the 
same information, i.e. that the end of the two paths is the same. 

As regards to path sharing, the size of the Internet makes it very improbable today that, in a WAN setting, a 
major number of users will share a significant number of links; traffic typically goes through many hops (8 to 15), 
so that, if correlation builds up somewhere, it will vanish later, as user streams split again, assuming that delays 
on different links are independent. Of course, this will not be the case if the majority of users share the first link, 
but this will mean that the users belong to the same LAN (perhaps Ethernet, FDDI ring etc. ISSI)^ which violates 
the WAN setting assumption. 

As regards to server sharing, the variety of interests of Internet users suggests that this will hardly ever be a 
problem. Extreme scenarios, such as users massively accessing news sites in order to get news about an emergency, 
are clearly rare, and no attempt was made here to model them. 

As a final comment, the combination of models G and D produces traffic statistically indistinguishable from the 
real one, from the point of view of the statistical tools hitherto presented: the Averaging function, the marginal 
distributions, and the autocorrelations yield identical results for simulations and real traffic. The fact that this 
was achieved without modeling any user dependence suggests that its effect is, in reality, less important than it is 
believed to be, or at least not detectable by these measurements. 

Burstiness and levels: In our analysis, spikiness of the traffic and curvature of the Enegry/Averaging function are 
two separate issues, due to different and independent mechanisms (geometric increase in transmission rate versus 
fragmentation of ON-intervals). Heavy-tailed distributions have no effect on the latter, but influence directly the 
former. 

Tools that could detect spikiness and levels could prove useful. Some tools, based on the theorems proved 
earlier, are given below: 
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Level-detecting tools: Since levels leave a trace on the Averaging function, the Averaging function can be used 
to detect the level structure in a particular data set. This would allow for monitoring of user behavior without 
resorting to a connection-wise study of the trace, which is a very inefficient technique: for trace 94, the Averaging 
function was about 20000 times faster than computing the function in Fig. El on the same computer. We propose 
the following two algorithms: 

Given the binary logarithms of the Averaging function: log2(A (0)),..., log2(A (n — 1)), we compute slopes 
Si = log2(A(i)) — log2(A (i — 1)), i = l,...,n — 1 and slope changes |5'i+i — i = l,...,n — 2. These last 
quantities are equivalent to second derivatives, which, as is well known, measure the local curvature: as we argued 
in our discussion of Theorem D, the Averaging function has “bumps” around its levels, thus any curvature indicates 
the presence of levels. There is, though, one more point: it is not really important how much adjacent slopes differ, 
but how much they differ with respect to their absolute magnitudes. It seems then more appropriate to consider: 

\Si+i - 5,1 

inax(inin(|S'j|, |S'i+i|),e)’ 

where e is a small positive number, which will safeguard against division by 0, or an extremely small number (we 
used 5 = 0.01). 

Tool 1: (Levels detector) 

L{i) = Sci, i = •••, n — 2 

The local maxima are considered to be levels. 

Since White Noise leads to an Averaging function whose binary logarithm is just a straight line of slope —0.5 
(see corollary B), a much simpler idea would be to declare that a level exists where the slope becomes too large, 
as a signed quantity (ideally, it should become positive). How large it should be exactly is, in view of the lack of 
an accurate method to determine this, a matter of taste. We think that, due to the “noise” induced by the finite 
number of measurements available, it is very hard to distinguish a decay of type from no decay at all, thus 

we set a threshold at —0.1. 

Tool 2: (Detecting flat regions) 

F{i) = lSi>-o.i, * = 1, •••, n-1 

Levels are considered to exist within flat (or concave) regions where F{i) = 1. 

The results of Tools 1 and 2 are shown in Fig. EHlEi and 123 for the real traces and some of the simulations, 
respectively. 

Spikiness calibration tools: Spikiness varies widely, as was explained before, according to the protocol specifi¬ 
cations, the amount of the transported data etc. For example, trace 94 seems much “wilder” than trace 97 (Fig. 

Since spikiness is intimately related to the marginal distribution, the Kolmogorov distance between the trace 
distribution and the Gaussian curve seems to be a good measure of it. But a knowledge of the degree of persistence 
of spikes through different time scales is also desirable, therefore the Kolmogorov distance should be computed for 
each of them. 

A possible extension would be to add information about how much the degree of spikiness varies within a 
particular time scale; this may be of particular importance in view of the oscillation of the process between 
Gaussianity and p-Stability (recall the discussion of model C). Thus, “windowed” marginals can be taken: the total 
trace, in a certain time scale, will be divided into adjacent parts (sub-traces) of consecutive points, the marginal 
of each and the corresponding Kolmogorov distances will be computed, and finally their mean value will be taken, 
in order to combine all this information into a single number-index. 

In order to combine into a single number the results from every time scale, we compute a (weighted) average. 
There is considerable liberty here regarding the choice of the weights: Tool 3 below utilizes an equiweighted sum, 
whereas Tool 4 uses weighted averages, inspired by the discussion in section |S1 

Tool 3: (Calibrating Deviation from Gaussian Process) 
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Given data Xi\ for i = 1,A/" = 2^^ and their coarser versions = -^(i-i)2i+i + ••• + Xj^ 23 , for i = 1,A/’j = 


AT, 


N2 ^ and j = 0,/c — 1 , compute the empirical cumulative distribution Fj(t) = N- ^ ^x3.<v where X^ stands 


i=l 


for the centered and normalized version of Xl (i.e. Xl = {X^ — Xj)/Std{Xj)), and then Dj = sup|Fj(t) — 

teM 

where ^(t) is the standard normal cumulative distribution. The index of burstiness is: 


k-l 

D = k-^Y.^^ 

j=0 

For Gaussian traffic, = 0; the larger D is, the more the traffic deviates from Gaussianity. 

Different k can be considered, but we suggest /c < n — 10, since for larger k sample size would not be sufficient, 
and thus the deviation of the empirical distribution from the true one would be no more negligible. 

Tool 4: (Calibration of Burstiness of the Traffic) 

Given data Xi and Xl as before and constants k and s (we suggest s >9 and k < n — s — 7) let Nj^s = N2~^~^. 
12^ 

Gompute Fj^i(t) = 2~^ ^ ~ ^ ^ ~ (where Xi stands for centered and 

i=(Z-l)2« + l 

studentized version of Xl ). In other words, Fj^i{t) is the empirical cumulative distribution of the /’th window of 

12^ , s 

the j’th time scale. Next, compute Dj^i = sup|Fj^/(t) — 4>(t)|, Tj^i = ^ ~ ^j~s Z. which 

i=(Z-l)2«+l 1=1 

represent the Kolmogorov distance of each window, the total traffic of each window and the mean Kolmogorov 
distance for the time scale j, respectively. Next, for a fixed j, compute the cross-correlation between the sequences 
and 

Cj = Corr{{Dj,iZXand{Tj,iZx) 

The result is computed as: 

E-ZoDj 

The larger O is, the burstier is the traffic. 

Note here that the main quantity considered is the correlation and that Kolmogorov distances are used as 
weights. The results will in general be negative (see discussion in section 0). 

Tools 1 and 2 seem to reveal more or less the same information, as to where the levels are, but Tool 1 is a bit 
more detailed than Tool 2, in showing how “prominent” levels are (Fig. I^and 1^ . The accuracy of the tools was 
tested by their results on simulated data (Fig. l30j) : they nicely match the curvature of the Averaging function for 
real traces, and they perform reasonably well with the simulations, where levels were artificially induced. As for 
tools 3 and 4, they reveal that trace 97 is the least spiky trace. This is in agreement with visual inspection (Fig. ^ 
and [21). 

Before closing this section, let us submit Tool 2 to a final (and hard) test: can it detect correctly the levels of the 
trace 94 session presented in Fig. ITTIandl^!^ We saw earlier in Fig. |2S|in section |^| that there seems to be a very 
prominent level with 0-intervals around 2^^m5 (actually extending from 2^^ms to 2^^m5), which nonetheless does 
not seem to affect the Averaging function much. Later, in section EH we explained why this is the case. But is it 
then still possible for Tool 2 to detect the “traces” of this level on the Averaging function, however weak these may 
be? Actually it can, as Fig. IT7\ demonstrates. Tool 2 detects levels near 2^ms, 2^^ms, and 2^^m5 — 2^^m5, which 
are very close to the ones detected by the IDA in Fig. ESI in particular, the one detected at 2^^ms corresponds to 
the IDA-detected level in question. 
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(a) (b) 




(c) (d) 

Figure 12: Kolmogorov Distances of trace marginals -(a), (c), (d)- and Slow Start simulation marginals -(b)- 
from the N(0,1) distribution. The numbers in (b)’s index denote M, the Slow Start maximum. Marginals have 
been computed using non-overlapping consecutive windows of 2^ = 512 bins: the oscillations, and the distances 
themselves, are too large to be attributed to the finite number of observations used for the computation of the 
empirical marginals. A time bin of 100ms was used in (a) and of 1ms in (c) and (d). Note the different vertical 
scales in the different figures. 
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(a) 


(b) 




(c) 


(d) 




(e) 


(f) 


Figure 13: Excerpts of Slow Start simulation traces with (a) M = 32, (c) M = 16, (e) M = 8, and their 
corresponding -(b),(d),(f)- autocorrelations. 
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(a) 


(b) 


Figure 14: Average window size (a) and number of active connections (b) for 94 trace, using a time bin of 10s. 



(a) 


(b) 


Figure 15: Energy (a) and Averaging (b) functions for the Slow Start Simulations: although some slight curving is 
present in these simulations, it is much more modest than what is observed in real traffic (see Fig. Ej). 
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(a) (b) 



(c) 


(d) 


Figure 16; Averaging functions for data -(a), (c), (d)- and simulation (b) traces. 
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(a) 


(b) 



(c) 


(d) 



(e) (f) 

Figure 17; The characteristic function of connection 1 341 513 1022 from trace 94, in 6 detail levels. Each figure 
presents a “zoom-in” in the boxed segment of the previous one, illustrating a fractal-like behavior. Here, 6 levels 
can be detected; the (approximate) lengths of their 1-intervals are 2000, 500, 100, 40, 5, and 0.1s, whereas the 
lengths of their 0-intervals are much smaller in each case; 400, 15, 10, and 4s. Note that Os are not plotted. 
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(c) 

Figure 18: The characteristic function of connection 167782718 167772271 60363 119 from trace M6, in 3 detail 
levels. Here, 3 levels can be detected: the “approximate” length of their 1-intervals is 2, 0.44, and 0.01, whereas 
the corresponding 0-intervals are much smaller. Notice that coarser levels are not present in the data set, due to 
its short duration. Notice also that Os are not plotted. 
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Simulation with 3 levels (r=0) 


Simulation with 3 levels (r=0) 



(a) 


(b) 


Simulation with 3 levels (r=0.3) 


Simulation with 3 levels (r=0.3) 



(c) 


(d) 


Simulation with 3 levels (r=0.6) 


Simulation with 3 levels (r=0.6) 
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(e) (f) 

Figure 19: Detection of 0- and 1-intervals of levels in simulated sessions (part I): the 1-interval mean sizes are at 2^, 
2^^, and 2^^, and the 0-interval mean sizes are at 2^, 2^, and 2^^. The intervals are uniformly distributed around 
their mean, and the width of variation over the mean is is 0 in (a), (b), 0.3 in (c), (d), and 0.6 in (e), (f) . Notice 
that after filling the gaps of the coarsest level, the IDA considers the whole data set as an 1-interval, hence there 
should be an indication of a level at time scale 18; this artifact has been removed from all pictures, except (f), 
where it was left for demonstration purposes. 
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Simulation with 3 levels (r=0.9) 



Time Scale+1 


Simulation with 3 levels (r=0.9) 



(a) 


(b) 



Simulation with 3 levels (Gaussian a^=4j.i) Simulation with 3 levels (Gaussian 0^=4^) 



(e) (f) 

Figure 20: Detection of 0- and 1-intervals of levels in simulated sessions, part II: the 1-interval mean sizes are at 2^, 
2^^, and 2^^, and the 0-interval mean sizes are at 2^, 2^, and 2^^. In figures (a) and (b) the intervals are uniformly 
distributed around their mean, and the width of variation over the mean is 0.9. In (c) and (d) the intervals are 
exponentially distributed, whereas in (e) and (f) they are gaussianly distributed, the variance for each level being 
proportinal to the mean: = 4/i. Notice that after filling the gaps of the coarsest level, the IDA considers the 

whole data set as an 1-interval, hence there should be an indication of a level at time scale 18; this artifact has 
been removed from all pictures, except (c) and (d), where it was left for demonstration purposes. 







































Figure 21: Averaging functions corresponding to the simulations of Fig. in^h) andlSnta). 
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Session 94/ 145 250 1951 6000 Averaging Function 



(c) 

Figure 22: Detection of 0- and 1-intervals of levels in the longest session of trace 94: (a) and (b) are the greyscale 
and sum representation of the IDA results, respectively; (c) shows the Averaging function, which is included here 
in order to facilitate comparison (see section EH). 
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(c) 

Figure 23: Detection of 0- and 1-intervals of levels in the session of trace 94 used in Fig. El (a) and (b) are 
the greyscale and sum representation of the IDA results, respectively; (c) shows the Averaging function, which is 
included here in order to facilitate comparison. According to the results of section IFTTl one might think that here 
the IDA and the Averaging function disagree; we will come back to that in sections lb.81 and IHI 
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Trace 94 (Bin size=2^ ^) Trace 94 (Bin size=2° 



(a) (b) 



(c) 

Figure 24: Aggregate results of the detection of 1- and 0-intervals of levels in all trace 94 sessions are shown in 
(a) and (b): the fact that the algorithm detects levels in this figure suggests that individual sessions have levels in 
common, which combine in a constructive, not destructive, way. Therefore, it is legitimate to assign levels to entire 
traces. Note here that the bins increase geometrically, as usual, but with a step of a/2 instead of 2. The Averaging 
function of trace 94 is shown in (c). 
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(a) (b) 

Figure 25: Average functions of some model D simulations: 8 has a level at 2^, 12 at 2^^ and 7/12/17 three levels 
at 2^, 2^^, 2^^. ON and OFF intervals of the levels are exponentially distributed, the mean being the level. In 
8S, ON/OFF intervals are uniformly distributed within ±10% around the mean. In (a), RTT is present: the ON 
intervals of the finest level consist of spikes of height 1, separated by exponentially distributed intervals with mean 
4. In (b), RTT is not present, and the ON intervals of the finest level are continuous. 
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(a) (b) 

Figure 26: Some more simulations: curvature is present, since there is one single change of slope, the smoothness 
of which varies. Hence, there is one level in these traces. 
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(a) (b) 

Figure 27: Averaging function of a simulated session with only one level whose 1-intervals live in time scale 15, 
but whose 0-intervals live in time scale 15 — where, for each curve, i can be read from the legend. In (a), as a 
rule, the local maximum of the bump “locks” on 15 — i, the scale of the 0-intervals. In (b), using a more diffused 
interval length distribution, the bump is negligible. 
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(d) 


Figure 28: Tools 1 and 2 applied to real traces, e = 0.01 was used. 


57 
































































































(c) 


(d) 


Figure 29: Tools 1 and 2 applied to real traces, e = 0.01 was used. 
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(a) 


(b) 


Figure 30: Tools 1 and 2 applied to two model D simulations: 8 (No spikes) and 7/12/17 (Spikes) 



Figure 31: Tool 2 applied on session 94/ 1 341 514 1022: it clearly identifies levels near 2^ms^ 2^^ms, and 
2^^ms — 2^^ms 
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Results of Tools 3 and 4 for real traces: 
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